PyTorch and Lightning compatible library that provides easy and flexible access to various time-series datasets for classification and regression tasks


License
MIT
Install
pip install torchchronos==0.0.4

Documentation

torchchronos

PyPI version license python version

test code style

torchchronos is an experimental PyTorch and Lightning compatible library that provides easy and flexible access to various time-series datasets for classification and regression tasks. It also provides a simple and extensible transform API to preprocess data. It is inspired by the much more complicated torchtime.

Installation

You can install torchchronos via pip:

pip install torchchronos

Usage

Datasets

torchchronos currently provides access to several popular time-series datasets, including:

To use a dataset, you can simply import the corresponding dataset class and create an instance:

from torchchronos.datasets import UCRUEADataset
from torchchronos.transforms import PadFront
from torchchronos.download import download_uea_ucr

download_uea_ucr("ECG5000",Path(".cache/data"))
dataset = UCRUEADataset('ECG5000', path=Path(".cache") / "data", transforms=PadFront(10))

Data Modules

torchchronos also provides Lightning compatible DataModules to make it easy to load and preprocess data. They support common use cases like (multi-)GPU training and train/test/val-splitting out of the box. For example:

from torchchronos.lightning import UCRUEADataModule
from torchchronos.transforms import PadFront, PadBack

module = UCRUEAModule('ECG5000', split_ratio= (0.75, 0.15), batch_size= 32,
                      transforms=Compose([PadFront(10), PadBack(10)]))

Analogous the the datasets above, these dataloaders are supported as of now, wrapping the respective datasets:

  • torchchronos.lightning.UCRUEADataModule
  • torchchronos.lightning.TFCPretrainDataModule

Transforms

torchchronos provides a flexible transform API to preprocess time-series data. For example, to normalize a dataset, you can define a custom Transform like this:

from torchchronos.transforms import Transform

class Normalize(Transform):
    def __init__(self, mean=None, std=None):
        self.mean = mean
        self.std = std

    def fit(self, data) -> Self:
        self.mean = data.mean()
        self.std = data.std()
        return self

    def __call__(self, data):
        return (data - self.mean) / self.std

Known issues

  • The dataset SpokenArabicDigits does not seem to work due to a missmatch of TRAIN and TEST size
  • The dataset UrbanSound does not seem to work due to missing ts files

Roadmap

The following features are planned for future releases of torchchronos:

  • Support for additional time-series datasets, including:
    • Energy consumption dataset
    • Traffic dataset
    • PhysioNet Challenge 2012 (in-hospital mortality)
    • PhysioNet Challenge 2019 (sepsis prediction) datasets
  • Additional transform classes, including:
    • Resampling
    • Missing value imputation

If you have any feature requests or suggestions, please open an issue on our GitHub page.