Change Detection in Categorical Evolving Data Streams - CDCStream
Paper available at http://dx.doi.org/10.5445/IR/1000155196, cite as (BibTex):
@techreport{TratBenderOvtcharova2023_1000155196,
author = {Trat, Martin and Bender, Janek and Ovtcharova, Jivka},
year = {2023},
title = {Sensitivity-Based Optimization of Unsupervised Drift Detection for Categorical Data Streams},
doi = {10.5445/IR/1000155196},
institution = {{Karlsruher Institut für Technologie (KIT)}},
issn = {2194-1629},
series = {KIT Scientific Working Papers},
keywords = {unsupervised conceptdriftdetection, data streammining, productiveartificialintelligence, categorical data processing},
pagetotal = {10},
language = {english},
volume = {208}
}
Implementation of an augmented version of Dino Ienco's algorithm CDCStream (https://doi.org/10.1145/2554850.2554864).
Installation
Requirements
- WEKA v3.8.6 or greater: Installation, GitHub
- Without this requirement, code execution fails.
- Java
- Download and install Java 11 OpenJDK 11, e.g. from RedHat (more recent versions might work as well).
- Note that I experienced issues using Temurin (via adoptium.net).
- Make sure that the Java folder (path including
/bin
at the end) is added to environment variable PATH. - Some problems during python package installation can be solved by correctly setting the environment variable JAVA_HOME. Set it to point to the respective java folder (e.g.
/usr/lib/jvm/java-11-openjdk-amd64
) NOT including/bin
or further components at the end. - Without this requirement, attempting to install package javabridge might fail.
- Build tools
- Ubuntu: Based on the python-weka-wrapper3 documentation, fulfill build requirements.
sudo apt-get install build-essential python3-dev
- Windows: Microsoft Visual C++ 14.0 or greater. For this, download Build Tools from Microsoft and install those (installation of Core Features for C++ Build Tools, C++ 2019 Redistributable Update, Windows 10 SDK and MSVC v142 (or greater) should suffice; a subsequent restart might be necessary).
- Without these requirements, attempting to install package javabridge might fail.
- Ubuntu: Based on the python-weka-wrapper3 documentation, fulfill build requirements.
- Python >=3.7
Setup
- Use pip (after installing above-stated requirements!):
python -m pip install cdcstream
- @poetry users: trouble installing python-javabridge? --> See Development section
- First usage of the cdcstream package should automatically add all required WEKA packages.
If this does not succeed: Manually add package DilcaDistance v1.0.2 or greater to WEKA:
- Start WEKA GUI
- Select
Tools
/Package manager
and install the latest version ofDilcaDistance
(Dependency fastCorrBasedFS should be installed after confirming prompted request.); It might be necessary to click theToggle load
button withDilcaDistance
selected in order to getYes
in the Loaded column.
Example
import numpy as np
import pandas as pd
from cdcstream.dilca_wrapper import dilca_workflow
from cdcstream import CDCStream, tools
N_BATCHES = 50
tools.manage_jvm_start() # start a Java VM in order to integrate WEKA
# instatiate drift detector
def alert_cbck(alert_code, alert_msg):
if not alert_msg:
alert_msg = 'no msg'
print(f'{alert_msg} (code {alert_code})')
c = CDCStream(
alert_callback=alert_cbck,
summary_extractor=dilca_workflow,
summary_extractor_args={'nominal_cols': 'all'},
factor_warn=2.0,
factor_change=3.0,
factor_std_extr_forg=0,
cooldown_cycles=0
)
# create random data (will be interpreted as being nominal)
batches = []
for i in range(N_BATCHES):
batches.append(
pd.DataFrame(np.random.randint(1, 10, size=(10,5)))
)
# employ created data as stream and feed it to drift detector
for b in batches:
c.feed_new_batch(b)
tools.manage_jvm_stop() # cleanup
Development
- Python poetry
- strangely, installation of python-javabridge fails with poetry versions > 1.1.15 (at the time of writing, newest poetry version is 1.3.1); this might be related to PEP 621 --> a workaround is to install python-javabridge via pip:
python -m poetry run pip install python-javabridge # from outside the virtual environment
- afterwards, continue installation via poetry
python -m poetry install
- strangely, installation of python-javabridge fails with poetry versions > 1.1.15 (at the time of writing, newest poetry version is 1.3.1); this might be related to PEP 621 --> a workaround is to install python-javabridge via pip:
License
Code is copyright to the FZI Research Center for Information Technology and released under the GNU General Public License v3.0. All dependencies are copyright to the respective authors and released under the respective licenses. A copy of these licenses is provided in LICENSE_LIBRARIES.
Acknowledgements
This software was developed at the FZI Research Center for Information Technology. The associated research was funded by the German Federal Ministry of Education and Research (grant number: 02K18D033) within the context of the project SEAMLESS.
To Do
- add tests