🏝️ NEVIS'22

NEVIS’22 is a benchmark for measuring the performance of algorithms in the field of continual learning. Please see the accompanying paper for more details.

Within this Python package, we provide three components,

Library code to download and post-process datasets that are not available within tfds, so that the stream used in the paper can be replicated.
A package to combine the NEVIS’22 datasets into a stream, and robustly evaluate learners using the evaluation protocol proposed in the NEVIS’22 paper.
Baseline learners implemented in JAX and PyTorch. The JAX learners are identical to the learners used for the figures in the paper, the PyTorch learners are provided for example purposes.

NEVIS’22 is composed of 106 tasks chronologically sorted and extracted from publications randomly sampled from online proceedings of major computer vision conferences over the past three decades. Each task is a supervised classification task, which is the most well understood setting in machine learning. The challenge is how to automatically transfer knowledge across related tasks in order to achieve a higher performance or be more efficient on the next task.

By construction, NEVIS’22 is reproducible, diverse and at a scale sufficiently large to test state of the art learning algorithms. The task selection process does not favor any particular approach, but merely tracks what the computer vision community has deemed interesting over time. NEVIS’22 is not just about data, it is also about the methodology used to train and evaluate learners. We evaluate learners in terms of their ability to learn future tasks, as measured by their trade-off between error rate and compute measured in the number of floating-point operations. In NEVIS’22, achieving lower error rate is by itself not sufficient, if this comes at an unreasonable computational cost. Instead, we incentivise both accurate and efficient models.

You can read more about NEVIS'22 in our paper and our blog post.

0. Dependencies

Please follow these steps and read in details section 1. and 2. before launching anything.

Our datasets use the Tensorflow(-datasets) API. Our JAX learners use TensorFlow and JAX, and our PyTorch Learners use PyTorch. Each (datasets, jax learners, and pytorch learners) have their own requirements.txt that you are welcome to install with pip and a Python version above 3.8.
It is also possible to run the code directly using the provided Dockerfiles. see here for installing Docker.
Some datasets are downloaded from Kaggle. See the Kaggle website for configuring your credentials, and place them in the folder ~/.kaggle.

1. Replicating the NEVIS'22 stream

In NEVIS'22, we train and evaluate on streams. Each stream is a sequence of datasets. Some streams have a large number of datasets, up to 106, allowing us to evaluate Large-Scale Continual Learning.

There are three different sources for datasets in NEVIS'22:

Datasets on Tensorflow-Datasets (TFDS): they will be downloaded automatically when needed
Custom dataset downloaders: you need the ./build_dataset.sh script
Manual dataset download: you need to download data yourself

Note that we do not host or distribute these datasets, instead we provide URLS to their original source to help you download them at your own risk. We do not vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license. If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset URL to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

We do our best to keep datasets URLs up-to-date. If a dataset doesn't download, please contact the dataset owners and open an issue to let us know. If a dataset doesn't get fixed by the owners, we will remove it from our benchmark.

1.1. TFDS Datasets

Different streams are available, each is made of a sequence of datasets. When iterating the datasets of a stream, TFDS datasets are automatically downloaded on-the-fly if they don't exist.

1.2. Custom Dataset Downloaders

Many datasets implemented in Nevis can be automatically downloaded. This has to be done in advance of training with the script ./build_dataset.sh:

$ ./build_dataset.sh -h
Usage:
        -d <DATASET_NAME> | Dataset name
        -s <STREAM_NAME>  | Stream variant name (FULL|SHORT|TINY|DEBUG|...)
        -b                | Build docker before running
        -e                | Develop mode where code is mounted
        -h                | Help message

If running for the first time, pass the option -b alongside other commands to build the docker (nevis-data). The develop mode is useful if you need to change the codebase (e.g. for adding a new dataset) and need to debug quickly without having to re-building the docker everytime (you still need to build the docker in develop mode! -b -e).

See in dm_nevis/streams/nevis_stream.py the enum NEVISStreamVariant for the full list of downloadable streams.

Some datasets are downloaded from Kaggle. See on the Kaggle website how to configure your credentials and place them in the folder ~/.kaggle.

1.3. Manual Download

ImageNet is a TFDS Dataset but it needs to be downloaded manually. Please check the instructions.

For info, TFDS will look for datasets in the directory defined by the environment variable TFDS_DATA_DIR.

2. Experiments

Each experiment consists of training a model on a stream of multiple datasets. Thus, this command will train a model on each dataset. We provide two main paradigms of learners: independent and finetune from previous. In the former, we create a new randomly initialized model for each dataset. In the latter, a model is initialized for the first dataset of the stream, and tuned sequentially for all datasets.

To launch an experiment, run:

./launch_local.sh <X> example

With <X> being the framework to use (jax or torch), second argument is the config to use.

Note that for the torch version, if you want to run on gpu instead of cpu, you need to provide the gpu id with --device <GPU_ID>. By default, the code is using the id -1 to symbolize cpu.

Output directory for metrics

By default the metrics computed by experiments_<X>/metrics/nevis_metrics.py will be written in ./nevis_output_dir.

You can specify a different path by overriding the environment variable NEVIS_DEFAULT_OUTPUT_DIR.

Metrics visualization with TensorBoard

The TensorBoard events file will be saved to ~/nevis/tensorboard. Each run will create a folder below this directory named with the date and time when the run was launched.

The tensorboard can be launched with the following command.

tensorboard --lodir=~/nevis/tensorboard

You will need to have tensorboard installed outside the docker using

pip install tensorboard

Regarding the different groups of plots on tensorboard dashboard: - benchmark_metrics contains metrics from prediction events across the stream, where the x-axis is the index (0-based) of the most training event. - train_event_<i> contains training and validation metrics on the training index with index i.

3. Example

Let's take an example learner (returns always zeros) that we will "train" on the DEBUG stream made of Pascal VOC 2007 and Coil100 datasets.

Pascal VOC 2007 is a TFDS dataset so it will be automatically downloaded when needed.

First we download Coil100 dataset:

./build_dataset.sh -e -b -s debug

Note that since the DEBUG stream only downloads Coil100, we could also have used -d coil100 instead of -s debug. As you can see in the script build_dataset.sh, we download data in ~/nevis. You can change this directory by overriding the env variable LOCAL_DIR in the script.

Then, we launch the example learner:

./launch_local.sh jax example

Note that the stream DEBUG is already specified in the config ./experiments_jax/config/example.py.

4. Baselines

We provide several baselines, defined in the learners/ directory with configurations in the configs/ directory. Note that the same approach might have multiple configurations.

Reminder, to run configuration configs/X.py, do ./launch_local.sh jax X.py.

We provide the following baselines:

Independent, in configs/finetuning_ind.py where each dataset is learned by an independent model
Previous, in configs/finetuning_prev.py where we learn sequentially each dataset and initialize its parameters from the parameter vector learned on the previous task.
Dynamic, in configs/finetuning_dknn.py. where the initialization of task T is chosen among the models which have been trained on a dataset most similar to the current dataset. This baseline performs hyperparameter tuning while learning the task, following the protocol described in our tech report.

Variants are also proposed, such as cheaper configurations in configs/cheap_finetuning_dknn.py which use a smaller net and fewer trials of hyper-parameter search. These are the best entry point for people who have access to only one or few GPUs.

It is also possible to run a pretrained model on the Nevis stream. First train your own pretrained model. For example on ImageNet, run the configuration configs/pretrain_imagenet.py. Collect the resulting checkpoint, see configuration file to see where it's saved. Then, use this checkpoint for configs/finetuning_ind_pretrained.py.

5. Code paths

The code is structured as follows:

|--- dm_nevis/
|    |--- benchmarker/
|    |--- datasets_storage/
|    |--- streams/
|--- experiments_jax/
|    |--- launch.py
|    |--- experiment.py
|    |--- configs/
|    |--- learners/
|    |--- metrics/
|    |--- environment/
|    |--- training/
|--- experiments_torch/
|    |--- launch.py
|    |--- experiment.py
|    |--- configs/
|    |--- learners/
|    |--- metrics/
|    |--- environment/
|    |--- training/

dm_nevis/ is the library of the benchmark, containing the benchmarker/ library, which implements the evaluation protocol used in the paper. datasets_storage/ is a package to support the downloading and preparation of datasets, and streams/ is a package defining different streams.

There are two directories containing baseline model implementations, one for jax (experiments_jax), and one for pytorch (experiments_torch). In each, launch.py is the Docker entrypoint, experiment.py is the module where all the execution happens, configs/ provides the hyperparameters for each learner, learners/ implements the learners (note: in some cases, there are different configs for the same learner), metrics/ implements the metrics used in NEVIS'22, environment/ provides the logger and checkpointer, and training/ provides learner-agnostic utilities such as the heads, the backbone, but also a flops counter for example.

Contact

If you wish to contact us, please raise a GitHub issue.

If you are using the NEVIS'22 benchmark, please cite the following paper,

@article{bornschein2022nevis,
  author={Bornschein, J\"org and Galashov, Alexandre and Hemsley, Ross and Rannen-Triki, Amal and Chen, Yutian and Chaudhry, Arslan and He, Xu Owen and Douillard, Arthur and Caccia, Massimo and Feng, Qixuang and Shen, Jiajun and Rebuffi, Sylvestre-Alvise and Stacpoole, Kitty and de las Casas, Diego and Hawkins, Will and Lazaridou, Angeliki and Teh, Yee Whye and Rusu, Andrei A. and Pascanu, Razvan and Ranzato, Marc'Aurelio},
  title={Nevis\'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research},
  journal={CoRR},
  volume={abs/2211.11747},
  year={2022},
  url={https://arxiv.org/abs/2211.11747},
  eprinttype={arXiv}
}

dm-nevis
Release 1.0.1

Release 1.0.1

1.0.1

1.0

Documentation

🏝️ NEVIS'22

0. Dependencies

1. Replicating the NEVIS'22 stream

1.1. TFDS Datasets

1.2. Custom Dataset Downloaders

1.3. Manual Download

2. Experiments

Output directory for metrics

Metrics visualization with TensorBoard

3. Example

4. Baselines

5. Code paths

Contact

Stats

Development practices

Releases

Contributors

dm-nevis Release 1.0.1

Release 1.0.1 Toggle Dropdown 1.0.1 1.0

Documentation

🏝️ NEVIS'22

0. Dependencies

1. Replicating the NEVIS'22 stream

1.1. TFDS Datasets

1.2. Custom Dataset Downloaders

1.3. Manual Download

2. Experiments

Output directory for metrics

Metrics visualization with TensorBoard

3. Example

4. Baselines

5. Code paths

Contact

Stats

Development practices

Releases

Contributors

dm-nevis
Release 1.0.1

Release 1.0.1

1.0.1

1.0