Difference Gradient Descent

Introduction

Difference Gradient Descent (DGD) expands Gradient Descent method for convex optimization to allow for objective functions where analytically gradient cannot be found. It does so by approximating the derivatives as $\frac{\partial F}{\partial X} \approx \frac{\Delta F}{\Delta X}$. Where $F$ is the objective function and $X$ is the parameter vector to be estimated. In n-dimensional space, the approximated gradient is given by $$ \begin{bmatrix} \frac{\partial F}{\partial x_1} \ \frac{\partial F}{\partial x_2} \ \vdots \ \frac{\partial F}{\partial x_n} \ \end{bmatrix} \approx \begin{bmatrix} \frac{\Delta F}{\Delta x_1} \ \frac{\Delta F}{\Delta x_2} \ \vdots \ \frac{\Delta F}{\Delta x_n} \ \end{bmatrix} $$ In the implementation $\Delta X := X_{t} - X_{t-1}$ is specified by difference argument and $\Delta F$ is computed by evaluating the objective function as: $\Delta F := F(X_{t}) - F(X_{t-1})$.

A Quick Example

from difference_gradient_descent import DifferenceGradientDescent
from schedules import Schedules

# Defining the objective function
def foo(params):
    return [(params[0]+2)**2]

optimizer = DifferenceGradientDescent(objective_function = foo)

# Creating decay schedules for difference values and learning rates
scheduler = Schedules(n_steps = 1000)
differences = scheduler.exponential_decay(initial_value=0.01,
                                          decay_rate=0.0005)

rates = scheduler.exponential_decay(initial_value=0.01,
                                    decay_rate=0.5)

# Running the algorithm
outputs, parameters = optimizer.difference_gradient_descent(initial_parameters=[5],
                                                            differences = differences,
                                                            learning_rates = rates,
                                                            epochs = 1000)

print("Solutions: ", parameters[-1])
print("Objective value: ", outputs[-1])

# Saves values of outputs and parameters as Pandas DataFrame...
values = optimizer.values_out(outputs,
                              parameters,
                              ["x"])
# ...and stores them as a CSV file
values.to_csv("values.csv")

Features and Advantages

Can optimize loss functions that cannot be solved analytically or where we cannot find analytical gradient
Supports parallelization via joblib library
Object-oriented approach makes it convenient for trying out different learning rates, differences values and algorithms
Partial Gradient Descent makes high-dimensional, simple problems less computationally expensive to solve
Built-in support for variable learning rates and differences

findi-descent
Release 0.1.0

Release 0.1.0

0.1.0

Documentation

Difference Gradient Descent

Introduction

A Quick Example

Features and Advantages

Stats

Development practices

Releases

Contributors

findi-descent Release 0.1.0

Release 0.1.0 Toggle Dropdown 0.1.0

Documentation

Difference Gradient Descent

Introduction

A Quick Example

Features and Advantages

Stats

Development practices

Releases

Contributors

findi-descent
Release 0.1.0

Release 0.1.0

0.1.0