Very Long Recurrent Neural Networks

RNN for long sequences usually have an adverse ratio of GPU utilization over memory consumption. Processing long sequences recurrently in general does not allow parallelization over the time dimension, as future activations depend on past activations. The only option for parallelization is over the batch dimension (increasing the batch size). At the same time, long sequences give rise to large memory consumption when computing gradients with common automatic differentiation techniques. Usually, in the forward pass, all activations in all layers and time steps are computed and stored in GPU memory. In the backward pass, the loss is differentiated and deltas are propagated back through the network, where, together with the stored activations, they are used to compute the weight updates (https://en.wikipedia.org/wiki/Backpropagation_through_time). The memory demand of stored activations scales linearly with the batch size, hence batch size is no lever to improve the utilization/memory ratio. The limited GPU memory may disallow reasonable GPU utilization to be achieved.

Solution

VLRNN allows one to efficiently compute forward and backward passes of RNN for (almost) arbitrarily long sequences. The memory efficiency comes at the cost of one additional forward pass without gradient computation.

works with arbitray RNN architectures that exhibit strictly sequential processing
multi-layer RNN
packed sequence (https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.PackedSequence.html) support for batches of variable length sequences

Limitations

overall loss must be linear function of per-timestep losses l_t
no bi-directional RNN

Usage

tbw

How does it work

The proposed solution in VLRNN is to perform forward/backward computations in blocks of short sequence length such that all activations inside a block fit well into GPU memory for decent batch sizes. In order to compute updates in a block in the middle of the sequence, we need

activations x_t at the static input of the block,
the latent input h_t-1 of the block (the RNN hidden state or memory cells),
the delta 𝛿z_t flowing into the block from sequence losses,
and the deltas 𝛿h_t+𝛥t at the end of the block (flowing back from backpropagating the adjacent block).

Except for the latent (hidden state) activations and deltas, everything is available. For 𝛿z_t we run a forward/backward pass through the output of the block, including the loss. For the latent activations we first run a forward pass through the network with gradient computations disabled and compute (and keep in GPU memory) the latent activations at the block entry points for all N blocks. Then we compute usual forward/backward passed through each block from last to first, collect gradients to all the (shared) weights in the block, and release all activations and deltas of this block, except $\delta h_t$ at the block input, from GPU memory. $\delta h_t$ is feed into the backward process of the preceding block.

Installation

All you need is

$ pip install vlrnn

Development

To help in developing VLRNN, clone the github repo and change to the cloned directory on the command line. Then

$ pip install -e .
$ pytest tests/

will install the package into your python path. Changes to files in the directory are reflected in the python package when loaded.

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

vlrnn
Release 0.0.1

Release 0.0.1

0.0.1

Documentation

Very Long Recurrent Neural Networks

Solution

Limitations

Usage

How does it work

Installation

Development

License

Stats

Development practices

Releases

Contributors

vlrnn Release 0.0.1

Release 0.0.1 Toggle Dropdown 0.0.1

Documentation

Very Long Recurrent Neural Networks

Solution

Limitations

Usage

How does it work

Installation

Development

License

Stats

Development practices

Releases

Contributors

vlrnn
Release 0.0.1

Release 0.0.1

0.0.1