Semi-supervised time series clustering with COBRAS
pip install cobras-ts==0.1.3
Library for semi-supervised clustering using pairwise constraints.
COBRAS supports three modes for constraint elicitation:
This package is available on PyPi:
$ pip install cobras_ts
The following dependencies are automatically installed: dtaidistance, kshape, numpy, scikit-learn.
In case you want to use the interactive GUI, install cobras_ts
using the following command to
automatically install additional dependencies (bokeh, datashader, and cloudpickle):
$ pip install --find-links https://dtai.cs.kuleuven.be/software/cobras/datashader.html pip cobras_ts[gui]
The COBRAS algorithm can easily be run from the command line.
A cobras_ts
script will be installed by pip:
$ cobras_ts --format=csv --labelcol=0 /path/to/UCR_TS_Archive_2015/ECG200/ECG200_TEST
This script is also available in the repository as cobras_ts_cli.py
.
Examples can also be found in the examples subdirectory.
Running COBRAS_kmeans:
import numpy as np from sklearn import metrics from cobras_ts.cobras_kmeans import COBRAS_kmeans from cobras_ts.labelquerier import LabelQuerier budget = 100 data = np.loadtxt('/home/toon/data/iris.data', delimiter=',') X = data[:,1:] labels = data[:,0] clusterer = COBRAS_kmeans(X, LabelQuerier(labels), budget) clusterings, runtimes, ml, cl = clusterer.cluster() final_clustering = clusterings[-1].construct_cluster_labeling() print(metrics.adjusted_rand_score(final_clustering,labels))
Running COBRAS_kShape:
import os import numpy as np from sklearn import metrics from cobras_ts.cobras_kshape import COBRAS_kShape from cobras_ts.labelquerier import LabelQuerier ucr_path = '/home/toon/Downloads/UCR_TS_Archive_2015' dataset = 'ECG200' budget = 100 data = np.loadtxt(os.path.join(ucr_path,dataset,dataset + '_TEST'), delimiter=',') series = data[:,1:] labels = data[:,0] clusterer = COBRAS_kShape(series, LabelQuerier(labels), budget) clusterings, runtimes, ml, cl = clusterer.cluster() final_clustering = clusterings[-1].construct_cluster_labeling() print(metrics.adjusted_rand_score(final_clustering,labels))
Running COBRAS_DTW:
This uses the dtaidistance package to compute the DTW distance matrix. Note that constructing this matrix is typically the most time consuming step, and significant speedups can be achieved by using the C implementation in the dtaidistance package.
import os import numpy as np from dtaidistance import dtw from sklearn import metrics from cobras_ts.cobras_dtw import COBRAS_DTW from cobras_ts.labelquerier import LabelQuerier ucr_path = '/home/toon/Downloads/UCR_TS_Archive_2015' dataset = 'ECG200' budget = 100 alpha = 0.5 window = 10 data = np.loadtxt(os.path.join(ucr_path,dataset,dataset + '_TEST'), delimiter=',') series = data[:,1:] labels = data[:,0] dists = dtw.distance_matrix(series, window=int(0.01 * window * series.shape[1])) dists[dists == np.inf] = 0 dists = dists + dists.T - np.diag(np.diag(dists)) affinities = np.exp(-dists * alpha) clusterer = COBRAS_DTW(affinities, LabelQuerier(labels), budget) clusterings, runtimes, ml, cl = clusterer.cluster() final_clustering = clusterings[-1].construct_cluster_labeling() print(metrics.adjusted_rand_score(final_clustering,labels))
This package uses Python3, numpy, scikit-learn, kshape and dtaidistance.
Toon Van Craenendonck at toon.vancraenendonck@cs.kuleuven.be
COBRAS code for semi-supervised time series clustering.
Copyright 2018 KU Leuven, DTAI Research Group
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.