Jenga

Overview

Jenga is an open source experimentation library that allows data science practititioners and researchers to study the effect of common data corruptions (e.g., missing values, broken character encodings) on the prediction quality of their ML models.

We design Jenga around three core abstractions:

Tasks contain a raw dataset, an ML model and a prediction task
Data corruptions take raw input data and randomly apply certain data errors to them (e.g., missing values)
Evaluators take a task and data corruptions, and execute the evaluation by repeatedly corrupting the test data of the task, and recording the predictive performance of the model on the corrupted test data.

Jenga's goal is assist data scientists with detecting such errors early, so that they can protected their models against them. We provide a jupyter notebook outlining the most basic usage of Jenga.

Note that you can implement custom tasks and data corruptions by extending the corresponding provided base classes.

We additionally provide three advanced usage examples of Jenga:

Installation

Jenga requires Python 3.6 and virtualenv. You can get the Jenga code running as follows:

Checkout this git repository
Create a virtual environment with python3.6 -m venv env
Activate the environment with source env/bin/activate
Install the latest version of pip with pip install --upgrade pip
Install the dependencies with pip install -r requirements.txt

Research

Jenga is based on experiences and code from our ongoing research efforts:

Sebastian Schelter, Tammo Rukat, Felix Biessmann (2020). Learning to Validate the Predictions of Black Box Classifiers on Unseen Data. ACM SIGMOD.
Tammo Rukat, Dustin Lange, Sebastian Schelter, Felix Biessmann (2020): Towards Automated ML Model Monitoring: Measure, Improve and Quantify Data Quality. ML Ops workshop at the Conference on Machine Learning and Systems (MLSys).
Felix Biessmann, Tammo Rukat, Philipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, David Salinas (2019). DataWig - Missing Value Imputation for Tables. JMLR (open source track)

jenga
Release 0.0.1a0

Release 0.0.1a0

0.0.1a1

0.0.1a0

Documentation

Jenga

Overview

Installation

Research

Stats

Development practices

Releases

Contributors

jenga Release 0.0.1a0

Release 0.0.1a0 Toggle Dropdown 0.0.1a1 0.0.1a0

Documentation

Jenga

Overview

Installation

Research

Stats

Development practices

Releases

Contributors

jenga
Release 0.0.1a0

Release 0.0.1a0

0.0.1a1

0.0.1a0