`ts-eval` Time Series analysis and evaluation tools

A set of tools to make time series analysis easier.

🧩 Current features

N-step ahead time series evaluation – using a Jupyter widget.
Friedman / Nemenyi rank test (posthoc) – to see which model statistically performs better.
Relative Metrics – rMSE, rMAE + Forecasted Value analogues.
Prediction Interval Metrics – MIS, rMIS, FVrMIS
Fixed fourier series generation – fixed in time according to pandas index
Naive/Seasonal models for baseline predictions (with prediction intervals)
Statsmodels n-step evaluation – helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.

👩🏾‍🎨 Widget Preview

In:

TSMetrics(target, sm_seas, default)
.use_reference(snaive)
.for_horizons(0, 1, 5, 23)
.for_time_slices(time_slices.all, time_slices.weekend)
.with_description()
.with_prediction_rankings(mtx.FVrMSE, mtx.FVrMIS)
.with_predictions_plot()
.show()

Out:

👩🏾‍🚒 Demo

For a more elaborate example, please check out the Demo Notebook.

Alternatively, check out interactive Binder demo

🤦🏾‍ Motivation

While working on a long term time series analysis project, I had a need to summarize and store performance metrics of different models and compare them. As it's daunting to do this across dozens of notebooks, I huddled over some code to do what I want in a few lines of code.

👩🏾‍🚀 Installation

  pip install ts-eval

📋 Release Planning:

Release 0.3
- remove collection of deps in style [tests_and_bla_bla] to [tests,bla]
- links to papers – AvgRelMAE (Davydenko and Fildes, 2013); link to Nemenyi paper / implementations
- make graphs with PIs more narrow on 0,1,.. steps as there's too much space left (with an option to turn this off).
- better API for the end user – minimize interaction with xarray
- pep517 build / wheels / better setup.py as per Hynek
- travis: add 3.8 default python when it's available
- docs: supported metrics & API options
- Maybe use api like Summary in statsmodels MLEModel class, it has extend methods and warn/info messages
- pretty legend for lots like here https://studywolf.wordpress.com/2017/11/21/matplotlib-legends-for-mean-and-confidence-interval-plots/
- Look for TODOs
- changable colors
- turn off colored display option
- a nicer API for raw metrics container
- codacy badge
- boxplots to compare models (as an alternative)
- violin plots to compare predictions – areas can be colored, different metrics on left and right (like relative...)
Release 0.4
- holiday/fourier features model
- fix viz module to have less of important stuff
- a gif with project visualization
- check shapes of input arrays (target vs preds), now it doesn't raise an error
- Baseline prediction using target dataset (without explicit calculation, but losing some time points)

💡 Ideas

components
- Graph: Visualize outliers from confidence interval
- Multi-comparison component: scikit_posthocs lib or homecooked?
- inspect true confidence interval coverage via sampling (was done in postings around bayesian dropout sampling)
- xarrays: compare if compared datasets are actually equal (offets by dates, shapes, maybe even hashing)
- bin together step performance, like steps 0-1, 2-5, 6-12, 13-24
- highlight regions using a mask (holidays, etc.)
- option to view interactively points using widget (plotly)?
- diagnostics: bias to over / underestimate points
- animated graphs for change in seasonality
features
- example notebook for fourier?
- tests for fourier
- nint generation
utils:
- model adaptor (for different models, generic) which generates 3d prediction dataset. For stastmodels using dyn forecast or kalman filter
- future importance calculator, but only if I can manipulate input features
- feature selection using PACF / prewhiten?
project
- more defensive style (add arg checks, so it's easier to understand what is going on)
- docstrings
- circleci
- https://timothycrosley.github.io/portray/ for docs
sMAPE & MASE can be added for the jupyter evaluation tables
? Residual stats: since I have residuals => Ljung-Box, Heteroscedasticity test, Jarque-Bera – like in statsmodels results, but probably these stats were inspected already by the user... and on which step should they be computed then?

🤹🏼‍♂️ Development

Recommended development workflow:

pipenv install -e .[dev]
pipenv shell

The library doesn't use Flit/Poetry, so the suggested workflow is based on Pipenv (as per https://github.com/pypa/pipenv/issues/1911). Pipfile* are ignored in the .gitignore.

ts-eval
Release 0.2.3

Release 0.2.3

0.2.3

0.2.2

0.2.1

0.2.0

0.1.0

0.0.1

Documentation

`ts-eval` Time Series analysis and evaluation tools

🧩 Current features

👩🏾‍🎨 Widget Preview

👩🏾‍🚒 Demo

🤦🏾‍ Motivation

👩🏾‍🚀 Installation

📋 Release Planning:

💡 Ideas

See also

🤹🏼‍♂️ Development

Stats

Development practices

Releases

Contributors

ts-eval Release 0.2.3

Release 0.2.3 Toggle Dropdown 0.2.3 0.2.2 0.2.1 0.2.0 0.1.0 0.0.1

Documentation

ts-eval Time Series analysis and evaluation tools

🧩 Current features

👩🏾‍🎨 Widget Preview

👩🏾‍🚒 Demo

🤦🏾‍ Motivation

👩🏾‍🚀 Installation

📋 Release Planning:

💡 Ideas

See also

🤹🏼‍♂️ Development

Stats

Development practices

Releases

Contributors

ts-eval
Release 0.2.3

Release 0.2.3

0.2.3

0.2.2

0.2.1

0.2.0

0.1.0

0.0.1

`ts-eval` Time Series analysis and evaluation tools