Rnnlab
Python API to train and compare RNN language models using Tensorflow
Installation:
Install rnnlab
via pip
:
pip install rnnlab
This will install required python modules, including numpy, tensorflow, pandas, sklearn, matplotlib, flask, spacy, and seaborn.
Environment Variables:
First, tell rnnlab
where to save its log and training data by creating environment variables in bashrc file:
export RUNS_DIR='<path to where you want to save model data>' export BACKUP_DIR_DIR='<path to where you want to backup model data>' export RNNLAB_DIR='<path to where you want to save the log files>'
Configs:
Before training can begin, run the following command to create configs file in RNNLAB_DIR
python train.py
A variety of training hyperparameters and other configurations may be specified here. Not all are required. Each row represents one configuration, and multiple unique configurations may be specified here. A bare-bones example to train a SRN with the supplied CHILDES (MacWhinney, 1984) data split into 256 docs, and iterating 20 times over each:
learning_rate | bptt_steps | corpus_name | num_types | num_parts | num_iterations |
0.01 | 7 | childes-20171213 | 4096 | 256 | 20 |
Training:
rnnlab
comes packaged with the corpus childes-20171213 so that you can test your implementation right out
of the box. Just make sure you specify the name of the corpus in your configs file.
To train one model per configuration specified in your configurations file, simply run the example with a command line argument specifying the model type:
python train.py
Note, that if you would like to train 3 replicas per configuration, you can:
python train.py -r3
- If rnnlab's log already contains 3 replicas of a particular configuration, training of that configuration will be skipped.
- Instead you can increase the number of replicas, or add an additional argument which turns off skipping:
python train.py -r3 --noskip
Analysis using the browser app
During training, hidden state activations for user-specified words (probes) are collected and saved to disk. An included browser application can visualize the data during and after training. After you have started training a model, a bash alias will have been created for easy access to the browser app. In a new bash terminal, type:
python app.py * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Here you can plot various evaluation metrics across training time, SVD and t-SNE of the learned representations of probe words, model comparisons, and much more.
Important Note
rnnlab
is still in the early stages of development. The package is aimed primarily for enabling replication studies.
Project Information
rnnlab
is released under the MIT license,
the code on GitHub,
and the latest release on PyPI.
Tested on Python 3.5.