lightning-bagua

Deep Learning Training Acceleration with Bagua and Lightning AI


Keywords
deep, learning, pytorch, AI
License
Apache-2.0
Install
pip install lightning-bagua==0.1.0

Documentation

Lightning âš¡ Bagua

Deep Learning Training Acceleration with Bagua and Lightning AI

lightning PyPI Status PyPI - Python Version PyPI Status Deploy Docs

General checks Build Status pre-commit.ci status

Bagua is a deep learning training acceleration framework which supports multiple advanced distributed training algorithms including:

  • Gradient AllReduce for centralized synchronous communication, where gradients are averaged among all workers.
  • Decentralized SGD for decentralized synchronous communication, where each worker exchanges data with one or a few specific workers.
  • ByteGrad and QAdam for low precision communication, where data is compressed into low precision before communication.
  • Asynchronous Model Average for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.

By default, Bagua uses Gradient AllReduce algorithm, which is also the algorithm implemented in DDP, but Bagua can usually produce a higher training throughput due to its backend written in Rust.

Installation

pip install -U lightning-bagua

Usage

Simply set the strategy argument in the Trainer:

from lightning import Trainer

# train on 4 GPUs (using Bagua mode)
trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)

See Bagua Tutorials for more details on installation and advanced features.