PyTorch Model Training and Experiment Tracking Framework

PyTorch, deep, learning, research, train, loop, deep-learning, machine-learning, nlp, python
pip install aitoolbox==1.4.0


AI Toolbox

PyPI version Build Status Documentation Status     codebeat badge CodeFactor


AIToolbox is a framework which helps you train deep learning models in PyTorch and quickly iterate experiments. It hides the repetitive technicalities of training the neural nets and frees you to focus on interesting part of devising new models. In essence, it offers a keras-style train loop abstraction which can be used for higher level training process while still allowing the manual control on the lower level when desired.

In addition to orchestrating the model training loop the framework also helps you keep track of different experiments by automatically saving models in a structured traceable way and creating performance reports. These can be stored both locally or on AWS S3 (Google Cloud Storage in beta) which makes the library very useful when training on the GPU instance on AWS. Instance can be automatically shut down when training is finished and all the results are safely stored on S3.


To install the AIToolbox package execute:

pip install aitoolbox

If you want to install the most recent version from github repository, first clone the package repository and then install via the pip command:

git clone

pip install ./aitoolbox

AIToolbox package can be also provided as a dependency in the requirements.txt file. This can be done by just specifying the aitoolbox dependency. On the other hand, to automatically download the current master branch from github include the following dependency specification in the requirements.txt:



TrainLoop is the main abstraction for PyTorch neural net training. At its core it handles the batch feeding of data into the model, calculating loss and updating parameters for a specified number of epochs. To learn how to define the TrainLoop supported PyTorch model please look at the Model section bellow.

After the model is created, the simplest way to train it via the TrainLoop abstraction is by doing the following:

from aitoolbox.torchtrain.train_loop import *

tl = TrainLoop(model,
               train_loader, val_loader, test_loader,
               optimizer, criterion)

model =

AIToolbox includes a few more advanced derivations of the basic TrainLoop which automatically handle the experiment tracking by creating model checkpoints, performance reports, example predictions, etc. All of this can be saved just on the local drive or can also be automatically stored on AWS S3. Currently implemented advanced TrainLoops are TrainLoopCheckpoint, TrainLoopEndSave and TrainLoopCheckpointEndSave. Here, 'Checkpoint' stands for checkpointing after each epoch, while 'EndSave' will only persist and evaluate at the very end of the training.

For the most complete experiment tracking it is recommended to use the TrainLoopCheckpointEndSave option. The optional use of the result packages needed for the neural net performance evaluation is explained in the experiment section bellow.

from aitoolbox.torchtrain.train_loop import *

    train_loader, validation_loader, test_loader,
    optimizer, criterion,
    project_name, experiment_name, local_model_result_folder_path,
    hyperparams, val_result_package=None, test_result_package=None,
    cloud_save_mode='s3', bucket_name='models', cloud_dir_prefix=''

Check out a full TrainLoop training & experiment tracking example.

Multi-GPU training

All TrainLoop versions in addition to single GPU also support multi-GPU training to achieve even faster training. Following the core PyTorch setup, two multi-GPU training approaches are available: DataParallel and DistributedDataParallel.

DataParallel (DP)

To use DataParallel-like multiGPU training with TrainLoop just set the TrainLoop's gpu_mode parameter to 'dp':

from aitoolbox.torchtrain.train_loop import *

model = ... # TTModel

    train_loader, val_loader, test_loader,
    optimizer, criterion,

Check out a full DataParallel training example.

DistributedDataParallel (DDP)

Distributed training on multiple GPUs via DistributedDataParallel is enabled by the TrainLoop itself under the hood by wrapping the model (TTModel, more in Model section) into DistributedDataParallel. TrainLoop also automatically spawns multiple processes and initializes them. Inside each spawned process the model and all other necessary training components are moved to the correct GPU belonging to a specific process. Lastly, TrainLoop also automatically adds the PyTorch DistributedSampler to each of the provided data loaders in order to ensure different data batches go to different GPUs and there is no overlap.

To enable distributed training via DistributedDataParallel, the user has to set the TrainLoop's gpu_mode parameter to 'ddp'.

from aitoolbox.torchtrain.train_loop import *

model = ... # TTModel

    train_loader, val_loader, test_loader,
    optimizer, criterion,
).fit(num_epochs=10, callbacks=None,
      num_nodes=1, node_rank=0, num_gpus=torch.cuda.device_count())

Check out a full DistributedDataParallel training example.

Automatic Mixed Precision training (AMP)

All the TrainLoop versions also support training with Automatic Mixed Precision (AMP). In the past this required using the Nvidia apex extension but from PyTorch 1.6 onwards AMP functionality is built into core PyTorch and no separate instalation is needed. Current version of AIToolbox already supports the use of built-in PyTorch AMP.

The user only has to set the TrainLoop parameter use_amp to use_amp=True in order to use the default AMP initialization and start training the model in the mixed precision mode. If the user wants to specify custom AMP GradScaler initialization parameters, these should be provided as a dict parameter use_amp={'init_scale': 2.**16, 'growth_factor': 2.0, ...} to the TrainLoop. All AMP initializations and training related steps are then handled automatically by the TrainLoop.

You can read more about different AMP details in the PyTorch AMP documentation.

Single-GPU mixed precision training

Example of single-GPU AMP setup:

from aitoolbox.torchtrain.train_loop import *

model = ... # TTModel

    model, ...,
    optimizer, criterion, 

Check out a full AMP single-GPU training example.

Multi-GPU DDP mixed precision training

When training in the multi-GPU setting, the setup is mostly the same as in the single-GPU. All the user has to do is set accordingly the use_amp parameter of the TrainLoop and to switch its gpu_mode parameter to 'ddp'. Under the hood, TrainLoop will initialize the model and the optimizer for AMP and start training using DistributedDataParallel approach.

Example of multi-GPU AMP setup:

from aitoolbox.torchtrain.train_loop import *

model = ... # TTModel

    model, ...,
    optimizer, criterion,

Check out a full AMP multi-GPU DistributedDataParallel training example.


To take advantage of the TrainLoop abstraction the user has to define their model as a class which is a standard way in core PyTorch as well. The only difference is that for TrainLoop supported training the model class has to be inherited from the AIToolbox specific TTModel base class instead of PyTorch nn.Module.

TTModel itself inherits from the normally used nn.Module class thus our models still retain all the expected PyTorch enabled functionality. The reason for using the TTModel super class is that TrainLoop requires users to implement two additional methods which describe how each batch of data is fed into the model when calculating the loss in the training mode and when making the predictions in the evaluation mode.

The code below shows the general skeleton all the TTModels have to follow to enable them to be trained with the TrainLoop:

from aitoolbox.torchtrain.model import TTModel

class MyNeuralModel(TTModel):
    def __init__(self):
        # model layers, etc.

    def forward(self, x_data_batch):
        # The same method as required in the base PyTorch nn.Module
        # return prediction
    def get_loss(self, batch_data, criterion, device):
        # Get loss during training stage, called from fit() in TrainLoop
        # return batch loss

    def get_predictions(self, batch_data, device):
        # Get predictions during evaluation stage 
        # + return any metadata potentially needed for evaluation
        # return predictions, true_targets, metadata


For advanced applications the basic logic offered in different default TrainLoops might not be enough. Additional needed logic can be injected into the training procedure by using callbacks and providing them as a parameter list to TrainLoop's fit(callbacks=[callback_1, callback_2, ...]) function.

AIToolbox by default already offers a wide selection of different useful callbacks. However when some completely new functionality is desired the user can also implement their own callbacks by inheriting from the base callback object AbstractCallback. All that the user has to do is to implement corresponding methods to execute the new callback at the desired point in the train loop, such as: start/end of batch, epoch, training.


Result Package

This is the definition of the model evaluation procedure on the task we are experimenting with. Result packages available out of the box can be found in the result_package module where we have implemented several basic, general result packages. Furthermore, for those dealing with NLP, result packages for several widely researched NLP tasks such as translation, QA can be found as part of the NLP module module. Last but not least, as the framework was built with extensibility in mind and thus if needed the users can easily define their own result packages with custom evaluations by extending the base AbstractResultPackage.

Under the hood the result package executes one or more metrics objects which actually calculate the performance metric calculation. Result package object is thus used as a wrapper around potentially multiple performance calculations which are needed for our task. The metrics which are part of the specified result package are calculated by calling the prepare_result_package() method of the result package which we are using to evaluate model's performance.

Experiment Saver

The experiment saver saves the model architecture as well as model performance evaluation results and training history. This can be done at the end of each epoch as a model checkpointing or at the end of training.

Normally not really a point of great interest when using the TrainLoop interface as it is hidden under the hood. However as AIToolbox was designed to be modular one can decide to write their own training loop logic but just use the provided experiment saver module to help with the experiment tracking and model saving. For PyTorch users we recommend using the FullPyTorchExperimentS3Saver which has also been most thoroughly tested. The experiment is saved by calling the save_experiment() function from the selected experiment saver and providing the trained model and the evaluated result package containing the calculated performance results.


All of these modules are mainly hidden under the hood when using different experiment tracking abstractions. However, if desired and only the cloud saving functionality is needed it is easy to use them as standalone modules in some desired downstream application.


Functionality for saving model architecture and training results to S3 either during training or at the training end. On the other hand, the module also offers the dataset downloading from the S3 based dataset store. This is useful when we are experimenting with datasets and have only a slow local connection, thus scp/FTP is out of the picture.

Google Cloud

Same functionality as for AWS S3 but for Google Cloud Storage. Implemented, however, not yet tested in practice.


Currently, mainly used for the performance evaluation result packages needed for different NLP tasks, such as Q&A, summarization, machine translation.

For the case of e.g. NMT the module also provides attention heatmap plotting which is often helpful for gaining addition insights into the seq2seq model. The heatmap plotter creates attention heatmap plots for every validation example and saves them as pictures to disk (potentially also to cloud).

Lastly, the nlp module also provides several rudimentary NLP data processing functions.

AWS GPU instance prep and management bash scripts

As some of the tasks when training models on the AWS cloud GPU are quite repetitive, the package also includes several useful bash scripts to automatize tasks such as instance initialization and bootstrapping, experiment file updating, remote AIToolbox installation updating, etc.

For further information look into the /bin/AWS folder and read the provided README.

Examples of package usage

Look into the /examples folder for starters. Will be adding more examples of different training scenarios.