seriesdistancematrix

Flexible time series analysis libraryimplementing Matrix Profile related functionality.


Keywords
time, series, matrix, profile, contextual, radius, distance, motif, discord
License
MIT
Install
pip install seriesdistancematrix==0.3.1

Documentation

Series Distance Matrix

This is a Python 3 library for performing (time) series analysis using the Series Distance Matrix, a framework that bundles various Matrix Profile related techniques. These techniques can be used for answering questions relating to pattern similarity in series. Some example applications include:

  • finding motifs in series (finding the best matching windows)
  • finding discords in series (finding the worst matching windows)
  • finding repetitions in series
  • visualizing series
  • finding changing patterns
  • ...

The Series Distance Matrix is a generalization of the Matrix Profile that splits the generation and consumption of the all-pair subsequence distances, putting the focus on the distance matrix itself. This allows for easier and more flexible experiments by freely combining components and eliminates the need to re-implement algorithms to combine techniques in an efficient way.

Following core techniques are implemented:

  • Z-normalized Euclidean distance (including noise elimination)
  • Euclidean distance
  • (Left/Right) Matrix Profile
  • Multidimensional Matrix Profile
  • Contextual Matrix Profile
  • Radius Profile
  • Streaming and batch calculation

Following Matrix Profile related techniques are implemented:

  • Valmod: find the top-1 motif in a series for each subsequence length in a given range
  • Ostinato: find the top-1 (k of n) consensus motif in a collection of series
  • Anytime Ostinato: find the radius profile for a collection of series

When using this library for academic purposes, please cite:

@article{series_distance_matrix,
title = "A generalized matrix profile framework with support for contextual series analysis",
journal = "Engineering Applications of Artificial Intelligence",
volume = "90",
pages = "103487",
year = "2020",
issn = "0952-1976",
doi = "https://doi.org/10.1016/j.engappai.2020.103487",
url = "http://www.sciencedirect.com/science/article/pii/S0952197620300087",
author = "Dieter De Paepe and Sander Vanden Hautte and Bram Steenwinckel and Filip De Turck and Femke Ongenae and Olivier Janssens and Sofie Van Hoecke"
}

Installing

This library is not yet in pip, please clone this repositor and run:

python setup.py clean build install

or for development (this creates a link to the source code, rather than a library):

python setup.py develop

Usage

The basic workflow goes as follows:

  • You have one or two one- or multi-channel time series (num_channels x num_measurements). When using two time series, the number of channels should match.
  • You select generators to process a single dimension. Generators know how to create the distance matrix for the time series.
  • You select consumers to handle the output of generators, some consumers work on a single output, some can work on multiple outputs. The main goal of consumers is to keep track of relevant info in the distance matrix in an efficient way.
  • You create a calculator and specify how much of the data you want processed, when completed, output is available in the consumers.

Example

%matplotlib inline

# Imports
import numpy as np
import matplotlib.pyplot as plt

from distancematrix.generator.znorm_euclidean import ZNormEuclidean  # Generators live in the generator package
from distancematrix.consumer.matrix_profile_lr import MatrixProfileLR  # Consumers live in the consumer package
from distancematrix.calculator import AnytimeCalculator

# Create a one-dimensional series with 2 artefacts
data = np.array([
    np.sin(np.linspace(0,20,1000))*0.2 + np.random.rand(1000) * 0.1
])

data[0, 100:120] += np.linspace(0.1, 0.5, 20)
data[0, 720:740] += np.linspace(0.1, 0.5, 20)

plt.plot(data[0])
plt.show()

# Setup generator, consumer and calculator
m = 100  # Subsequence length

calc = AnytimeCalculator(m, data)  # One series passed => self-join

gen_0 = calc.add_generator(0, ZNormEuclidean(noise_std=0.))  # Generator 0 works on channel 0
cons_0 = calc.add_consumer([0], MatrixProfileLR())  # Consumer 0 works on generator 0

# Calculate
calc.calculate_diagonals(print_progress=True, partial=1.)

# Admire the results
min_idx = np.argmin(cons_0.matrix_profile())
match_idx = cons_0.profile_index()[min_idx]

plt.figure(figsize=(15,5))

plt.subplot(2, 1, 1)
plt.title("Data")
plt.plot(data[0])
plt.plot(range(min_idx, min_idx+m), data[0, min_idx : min_idx + m], label = "Motif")
plt.plot(range(match_idx, match_idx+m), data[0, match_idx:match_idx+m], label="Best match")
plt.gca().set_xlim((0, data.shape[1]))
plt.legend()

plt.subplot(2, 1, 2)
plt.title("Matrix Profile")
plt.plot(cons_0.matrix_profile())
plt.vlines(min_idx, 0, 10, label="Minimum")
plt.gca().set_xlim((0, data.shape[1]))
plt.legend()

plt.show()