# Deep Learning Toy

Lightweight deep learning library implemented in Python. Designed for studying how contemporary deep learning libraries are implemented.

## Architecture

There are several core ideas used by the framework: computational graph, forward propagation, loss/cost function, gradient descent, and backward propagation.
*Computational graph* is a graph representing ordered set of primitive algeabric operations. Forward propagation feeds an input into a computational graph and produces the output. *Loss function* is a metric measuring how well a model estimates class or a value based on the input; usually, a loss function produces a scalar value. *Gradient descent* is the calculus approach for a loss function minimization. It uses the simple idea that in order to minimize a function we have to follow a path directed by its variables gradients. *Backward propagation* takes a graph in the state after forward propagation had finished, and calculates gradients starting from the output towards the input; this direction from the head of the computational graph towards the tail is the result of the calculus chain rule.

## Computational Graph

ComputationalGraph class is equipped with methods representing primitive algeabric operations. Each method takes an input and produces an output. Inputs and outputs are represented by the Connection class, and operations by the Node class. There are two types of connections: constants and variables. The former do not change during the model optimization, but the latter could be changed during the optimization process. Here is the example of the primitive computational graph which adds two numbers:

```
from pydeeptoy.computational_graph import *
cg = ComputationalGraph()
sum_result = cg.sum(cg.constant(1), cg.constant(2))
```

The code listed above builds the computational graph, but doesn't execute it. In order to execute the graph the SimulationContext class should be used. The simulation context has the logic for doing forward/backward propagation. In addition, it stores all computation results produced by each and every operation, including gradients obtained during the backward phase. The code executing the computational graph described above:

```
from pydeeptoy.computational_graph import *
from pydeeptoy.simulation import *
cg = ComputationalGraph()
sum_result = cg.sum(cg.constant(1), cg.constant(2))
ctx = SimulationContext()
ctx.forward(cg)
print("1+2={}".format(ctx[sum_result].value))
```

## Atomic Operations

A computational graph is composed from a set of operations. An operation is the minimum building block of a computational graph. In the framework an operation is represented by the abstract Node class. All operation take an input in the form of a numpy array or a scalar value and produce either a scalar value or a numpy array. In other words, a computational graph passes a tensor through itself. That is why one of the most popular deep learning framework is called TensorFlow. The following operations are implemented in the computational_graph module:

Operation | Description |
---|---|

sum | Computes the sum of two tensors. |

multiply | Computes the product of two tensors. |

matrix_multiply | Computes the product of two matrices (aka 2 dimensional tensors). |

div | Divides one tensor by another. |

exp | Calculate the exponential of all elements in the input tensor. |

log | Natural logarithm, element-wise. |

reduce_sum | Computes the sum of elements across dimensions of a tensor. |

max | Element-wise maximum of tensor elements. |

broadcast | |

transpose | Permute the dimensions of a tensor. |

reshape | Gives a new shape to an array without changing its data. |

conv2d | Computes a 2-D convolution given 4-D input and filter tensors. |

## Activation Functions

Activation functions are used for thresholding a single neuron output. First, a neuron calculates its output based on the weighted sum of its inputs. Second, the calculated weighted sum is fed into the activation function. Finally, the activation function produces the final neuron output. Usually, an activation function ouput is normalized to be in between 0 and 1, or -1 and 1. The list of implemented activation functions:

## Loss Functions

Loss functions are used as a mesure of the model performance. Usually, it is just a scalar value telling how well a model estimates output based on the input. Needless to say, a universal loss function which fits all model flavours doesn't exists. The following loss functions are implemented in the losses module:

## Usage Examples

The set of primitive building blocks provided by the framework could be used to build robust estimators. The benefit of using the framework is that you do not have to implement forward/backward propagation from scratch for every kind of an estimator.

Iris | MNIST | CIFAR-10 | |
---|---|---|---|

Support Vector Machine (SVM) | Example | ||

Multilayer Perceptron | Example | Example |