divina: scalable and automatable multi-horizon forecasting toolkit
What is it?
divina
confronts four main problems for professional forecasters:
- Multiple-horizon forecasting often involves repeated programming of the same, complex code across train, predict, validation and visualization
- Many forecasting model implementations do not follow the standard scikit-interface or scale well to large datasets
- Different forecasting models often benefit from the complex engineering of the same time-sensitive features
- Because of the above three, deployment and scaling of multi-horizon forecasting ensembles is considerably more complex than a typical machine learning pipeline
Main Features
divina
addresses the aforementioned problems by:
- Providing a single Python object with a simple interface that abstracts away the complexities of multi-horizon train, predict, valiation and visualization
- Providing a library of interface-standardized model ensemble candidates that can be mixed and matched depending on the forecasting problem
- Providing consistent, efficient implementations of popular time-series engineered features
- Built-in integration with Dask for efficient, cloud-based scaling and Prefect for automation, fault-tolerance, queue management and artifact persistence
Roadmap
Current development priorities and improvements slated for next and beta release are:
- Addition of visualization methods that produce commonly-required charts via Highcharts
- Additional machine learning model options, such as XGBoost and CNNs
- Additional boosting model options, such as RNNs, LSTMs, ARIMA, SARIMA, etc.
- Addition of more realistic test cases, useful error messages and robust documentation
- Addition of GPU support via CUDA, CUDF and CUML
Where to get it
The source code is currently hosted on GitHub at: https://github.com/secrettoad/divina
Documentation
divina
's documentation is available here.
Binary installers for the latest released version are available at the Python Package Index (PyPI)
pip install divina
Dependencies
- dask - Adds support for arbitrarily large datasets via remote, parallelized compute
- dask-ml - Provides distributed-optimized implementations of many popular models
- s3fs - Allows for easy and efficient access to S3
- pyarrow - Enables persistence of datasets as storage and compute efficent parquet files
- prefect - Enables task orchestration, tracking and persistence
Testing
For local integration testing, run the following commands in order to create the necessary Prefect and Min.io containers.
docker pull jhurdle/divina-storage
docker pull jhurdle/divina-prefect
docker run -p 9000:9000 jhurdle/divina-storage
docker run -p 4200:4200 jhurdle/divina-prefect
pytest divina/divina/tests
License
Background
Work on divina
started at Coysu Consulting (a technology consulting firm) in 2020 and
has been under active development since then.
Getting Help
For usage questions, the best place to go to is StackOverflow.
Discussion and Development
Most development discussions take place on GitHub in this repo.
Contributing to divina
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
If you are simply looking to start working with the divina codebase, navigate to the GitHub "issues" tab and start looking through interesting issues.