Introducing Raven Protocol's Distributed Deep Learning tool that allows Requesters to easily build, train and test their neural networks by leveraging the compute power of participating nodes across the globe.
RavDL can be thought of as a high level wrapper (written in Python) that defines the mathematical backend for building layers of neural networks by utilizing the fundamental operations from Ravop library to provide essential abstractions for training complex DL architectures in the Ravenverse.
This framework seemlessly integrates with the Ravenverse where the models get divided into optimized subgraphs, which get assigned to the participating nodes for computation in a secure manner. Once all subgraphs have been computed, the saved model will be returned to the requester.
In this manner, a requester can securely train complex models without dedicating his or her own system for this heavy and time-consuming task.
There is something in it for the providers too! The nodes that contribute their processing power will be rewarded with tokens proportionate to the capabilities of their systems and duration of participation. More information is available here.
Table of Contents
Installation
Make sure Ravop is installed and working properly.
With PIP
pip install ravdl
Layers
Dense
Dense(n_units, initial_W=None, initial_w0=None, use_bias='True')
Parameters
-
n_units
: Output dimension of the layer -
initial_W
: Initial weights of the layer -
initial_w0
: Initial bias of the layer -
use_bias
: Whether to use bias or not
Shape
- Input: (batch_size, ..., input_dim)
- Output: (batch_size, ..., n_units)
BatchNormalization1D
BatchNormalization1D(momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)
Parameters
-
momentum
: Momentum for the moving average and variance -
epsilon
: Small value to avoid division by zero -
affine
: Whether to learn the scaling and shifting parameters -
initial_gamma
: Initial scaling parameter -
initial_beta
: Initial shifting parameter -
initial_running_mean
: Initial running mean -
initial_running_var
: Initial running variance
Shape
- Input: (batch_size, channels) or (batch_size, channels, length)
- Output: same as input
BatchNormalization2D
BatchNormalization2D(num_features, momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)
Parameters
-
num_features
: Number of channels in the input -
momentum
: Momentum for the moving average and variance -
epsilon
: Small value to avoid division by zero -
affine
: Whether to learn the scaling and shifting parameters -
initial_gamma
: Initial scaling parameter -
initial_beta
: Initial shifting parameter -
initial_running_mean
: Initial running mean -
initial_running_var
: Initial running variance
Shape
- Input: (batch_size, channels, height, width)
- Output: same as input
LayerNormalization
LayerNormalization(normalized_shape=None, epsilon=1e-5, initial_W=None, initial_w0=None)
Parameters
-
normalized_shape
: Shape of the input or integer representing the last dimension of the input -
epsilon
: Small value to avoid division by zero -
initial_W
: Initial weights of the layer -
initial_w0
: Initial bias of the layer
Shape
- Input: (batch_size, ...)
- Output: same as input
Dropout
Dropout(p=0.5)
Parameters
-
p
: Probability of dropping out a unit
Shape
- Input: any shape
- Output: same as input
Activation
Activation(name='relu')
Parameters
-
name
: Name of the activation function
Currently Supported: 'relu', 'sigmoid', 'tanh', 'softmax', 'leaky_relu','elu', 'selu', 'softplus', 'softsign', 'tanhshrink', 'logsigmoid', 'hardshrink', 'hardtanh', 'softmin', 'softshrink', 'threshold',
Shape
- Input: any shape
- Output: same as input
Conv2D
Conv2D(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', initial_W=None, initial_w0=None)
Parameters
-
in_channels
: Number of channels in the input image -
out_channels
: Number of channels produced by the convolution -
kernel_size
: Size of the convolving kernel -
stride
: Stride of the convolution -
padding
: Padding added to all 4 sides of the input (int, tuple or string) -
dilation
: Spacing between kernel elements -
groups
: Number of blocked connections from input channels to output channels -
bias
: If True, adds a learnable bias to the output -
padding_mode
: 'zeros', 'reflect', 'replicate' or 'circular' -
initial_W
: Initial weights of the layer -
initial_w0
: Initial bias of the layer
Shape
- Input: (batch_size, in_channels, height, width)
- Output: (batch_size, out_channels, new_height, new_width)
Flatten
Flatten(start_dim=1, end_dim=-1)
Parameters
-
start_dim
: First dimension to flatten -
end_dim
: Last dimension to flatten
Shape
- Input: (batch_size, ...)
- Output: (batch_size, flattened_dimension)
MaxPooling2D
MaxPooling2D(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Parameters
-
kernel_size
: Size of the max pooling window -
stride
: Stride of the max pooling window -
padding
: Zero-padding added to both sides of the input -
dilation
: Spacing between kernel elements -
return_indices
: If True, will return the max indices along with the outputs -
ceil_mode
: If True, will use ceil instead of floor to compute the output shape
Shape
- Input: (batch_size, channels, height, width)
- Output: (batch_size, channels, new_height, new_width)
Embedding
Embedding(vocab_size, embed_dim, initial_W=None)
Parameters
-
vocab_size
: Size of the vocabulary -
embed_dim
: Dimension of the embedding -
initial_W
: Initial weights of the layer
Shape
- Input: (batch_size, sequence_length)
- Output: (batch_size, sequence_length, embed_dim)
Optimizers
RMSprop
RMSprop(lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
Parameters
-
lr
: Learning rate -
alpha
: Smoothing constant -
eps
: Term added to the denominator to improve numerical stability -
weight_decay
: Weight decay (L2 penalty) -
momentum
: Momentum factor -
centered
: If True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
Adam
Adam(lr=0.001, betas=(0.9,0.999), eps=1e-08, weight_decay=0, amsgrad=False)
Parameters
-
lr
: Learning rate -
betas
: Coefficients used for computing running averages of gradient and its square -
eps
: Term added to the denominator to improve numerical stability -
weight_decay
: Weight decay (L2 penalty) -
amsgrad
: If True, use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond
Losses
- Mean Squared Error
ravop.square_loss(y_true, y_pred)
- Cross Entropy
ravop.cross_entropy_loss(y_true, y_pred, ignore_index=None, reshape_target=None, reshape_label=None)
Usage
This section gives a more detailed walkthrough on how a requester can define their ML/DL architectures in Python by using RavDL and Ravop functionalities.
Note: The complete scripts of the functionalities demonstrated in this document are available in the Ravenverse Repository.
Authentication and Graph Definition
The Requester must connect to the Ravenverse using a unique token that they can generate by logging in on Raven's Website using their MetaMask wallet credentials.
import ravop as R
R.initialize('<TOKEN>')
In the Ravenverse, each script executed by a requester is treated as a collection of Ravop Operations called Graph.
Note: In the current release, the requester can execute only 1 graph with their unique token. Therefore, to clear any previous/existing graphs, the requester must use
R.flush()
method.
The next step involves the creation of a Graph...
R.flush()
R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed')
Note:
name
andalgorithm
parameters can be set to any string. However, theapproach
needs to be set to either "distributed" or "federated".
The Current Release of RavDL supports TorchScript, Functional and Sequential Model Definitions.
GPU Support
The requester can now utilize GPU resources on provider nodes by using the gpu_required
parameter to yes
in the Graph definition.
R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed', gpu_required='yes')
Important: This feature is currently in beta and is available only for the TorchScript Model Definition.
Functional Model Definition
Define Custom Layers
The latest release of RavDL supports the definition of custom layers by the requester allowing them to write their own application-specific layers either from scratch or as the composition of existing layers.
The custom layer can be defined by inheriting the CustomLayer
class from ravdl.v2.layers
module. The class defined by the requester must implement certain methods shown as follows:
class CustomLayer1(CustomLayer):
def __init__(self) -> None:
super().__init__()
self.d1 = Dense(n_hidden, input_shape=(n_features,))
self.bn1 = BatchNormalization1D(momentum=0.99, epsilon=0.01)
def _forward_pass_call(self, input, training=True):
o = self.d1._forward_pass(input)
o = self.bn1._forward_pass(o, training=training)
return o
class CustomLayer2(CustomLayer):
def __init__(self) -> None:
super().__init__()
self.d1 = Dense(30)
self.dropout = Dropout(0.9)
self.d2 = Dense(3)
def _forward_pass_call(self, input, training=True):
o = self.d1._forward_pass(input)
o = self.dropout._forward_pass(o, training=training)
o = self.d2._forward_pass(o)
return
Defining Custom Model Class
The custom model class can be defined by inheriting the Functional
class from ravdl.v2
module. This feature allows the requester to define their own model class by composing the custom and existing layers.
The class defined by the requester must implement certain methods shown as follows:
class ANNModel(Functional):
def __init__(self, optimizer):
super().__init__()
self.custom_layer1 = CustomLayer1()
self.custom_layer2 = CustomLayer2()
self.act = Activation('softmax')
self.initialize_params(optimizer)
def _forward_pass_call(self, input, training=True):
o = self.custom_layer1._forward_pass(input, training=training)
o = self.custom_layer2._forward_pass(o, training=training)
o = self.act._forward_pass(o)
return o
Note: The
initialize_params
method must be called in the__init__
method of the custom model class. This method initializes the parameters of the model and sets the optimizer for the model.
Defining the Training Loop
The requester can now define their training loop by using the batch_iterator
function from ravdl.v2.utils
module. This function takes the input and target data as arguments and returns a generator that yields a batch of data at each iteration.
Note that the _forward_pass()
and _backward_pass()
methods of the custom model class must be called in the training loop.
optimizer = Adam()
model = ANNModel(optimizer)
epochs = 100
for i in range(epochs):
for X_batch, y_batch in batch_iterator(X, y, batch_size=25):
X_t = R.t(X_batch)
y_t = R.t(y_batch)
out = model._forward_pass(X_t)
loss = R.square_loss(y_t, out)
model._backward_pass(loss)
Make a Prediction
out = model._forward_pass(R.t(X_test), training=False)
out.persist_op(name="prediction")
Note: The
_forward_pass()
method takes an additional argumenttraining
which is set toTrue
by default. This argument is used to determine whether the model is in training mode or not. The_forward_pass()
method must be called withtraining=False
when making predictions.
Complete example scripts of Functional Model can be found here:
Sequential Model Definition
Setting Model Parameters
from ravdl.v2 import NeuralNetwork
from ravdl.v2.optimizers import RMSprop, Adam
from ravdl.v2.layers import Activation, Dense, BatchNormalization1D, Dropout, Conv2D, Flatten, MaxPooling2D
model = NeuralNetwork(optimizer=RMSprop(),loss='SquareLoss')
Adding Layers to Model
model.add(Dense(n_hidden, input_shape=(n_features,)))
model.add(BatchNormalization1D())
model.add(Dense(30))
model.add(Dropout(0.9))
model.add(Dense(3))
model.add(Activation('softmax'))
You can view the summary of model in tabular format...
model.summary()
Training the Model
train_err = model.fit(X, y, n_epochs=5, batch_size=25)
By default, the batch losses for each epoch are made to persist in the Ravenverse and can be retrieved later on as and when the computations of those losses are completed.
Testing the Model on Ravenverse
If required, model inference can be tested by using the predict
function. The output is stored as an Op and should be made to persist in order to view it later on.
y_test_pred = model.predict(X_test)
y_test_pred.persist_op(name='test_prediction')
TorchScript Model Op
RavDL now supports direct loading and training of Pytorch models on the Ravenverse using our powerful distribution strategies.
Note: The Requester must first convert their Pytorch model to TorchScript model using the
torch.jit.script
function. We recommend you to refer to the Pytorch documentation for more information on TorchScript.
Creating the Model Op
The Model Op takes a TorchScript model file (.pt
) as an argument and creates an Op that can be used to train the model on the Ravenverse.
import ravop as R
model_op = R.model('test_model.pt')
This model op can now be loaded directly into a RavDL model and further used to define the training loop.
from ravdl.v2 import Pytorch_Model
from ravdl.v2.optimizers import Adam
optimizer = Adam()
model = Pytorch_Model(model_op=model_op)
model.initialize(optimizer)
Training the Model
for i in range(epochs):
for X_batch, y_batch in batch_iterator(X, y, batch_size=256):
X_t = R.t(X_batch.astype(np.float32))
y_t = R.t(y_batch.astype(np.float32))
out = model._forward_pass(X_t)
loss = R.square_loss(y_t, out)
# Set step = True whenever optimizer step needs to be called after backprop (defaults to True).
model._backward_pass(loss, step = True)
Saving and Fetching the Trained Model
Apart from persisting output Ops, the trained model can also be saved on the Ravenverse. This can be done by calling the save_model
function.
model.save_model(name='my_net')
The saved model can be fetched from the Ravenverse by calling the fetch_persisting_op
function.
my_net = R.fetch_persisting_op(op_name='my_net')
Complete example scripts of TorchScript Model loading and training can be found here:
Activating the Graph
Once the model has been defined (Functional/Sequential) and all required Ops for the Graph have been defined, then Graph can be activated and made ready for execution as follows:
R.activate()
Here is what should happen on activating the Graph (the script executed below is available here):
Executing the Graph
Once the Graph has been activated, no more Ops can be added to it. The Graph is now ready for execution. Once Ravop has been initialized with the token, the graph can be executed and tracked as follows:
R.execute()
R.track_progress()
Here is what should happen on executing the Graph (the script executed below is available here):
Retrieving Persisting Ops
As mentioned above, the batch losses for each epoch can be retrieved as and when they have been computed. The entire Graph need not be computed in order to view a persisting Op that has been computed. Any other Ops that have been made to persist, such as y_test_pred
in the example above, can be retrieved as well.
batch_loss = R.fetch_persisting_op(op_name="training_loss_epoch_{}_batch_{}".format(epoch_no, batch_no))
print("training_loss_epoch_1_batch_1: ", batch_loss)
y_test_pred = R.fetch_persisting_op(op_name="test_prediction")
print("Test prediction: ", y_test_pred)
Note: The Ops that have been fetched are of type torch.Tensor.
License
This project is licensed under the MIT License - see the LICENSE file for details