An ultimate workflow for solving machine learning competitions with tabular data.
Install KTS with
pip install kts. Compatible with Python 3.6+.
- Modular feature engineering
- Source code tracking
- Caching of interim results
- Standard library for feature engineering
- Easy customization
- Local leaderboard
- Leak-free stacking in one line
- Parallel or distributed backend (feature computing/training/inference/hyperparameter tuning) -- coming soon
import kts from kts import *
Load data from user cache:
train = kts.load('train') test = kts.load('test')
Create functions computing blocks of new features:
@register def feature_1(df): ... @register def feature_2(df): ... @register def feature_3(df): ...
Combine them using FeatureSet:
fs_1 = FeatureSet([feature_1, feature_2, feature_3], target_columns=..., df_input=train)
Define a validation strategy:
from sklearn.metrics import roc_score from sklearn.model_selection import StratifiedKFold skf = StratifiedKFold(10, True, 42) val = Validator(skf, roc_score)
Train trackable models (built in or custom) using your features and get their IDs at the local leaderboard:
from zoo.binary_classification import CatBoostClassifier, LGBMClassifier, LogisticRegression cb = CatBoostClassifier(iterations=50) lgb = LGBMClassifier() summary_cb = val.score(cb, fs_1, verbose=False) summary_lgb = val.score(cb, fs_1, verbose=False)
kts.stack to get leak-free validator and a feature block with the predictions of first-level models, then add this block to your set of features and train a second-level model:
ids_to_stack = [summary_cb['id'], summary_lgb['id']] val_stack, fc_stack = kts.stack(ids_to_stack) fs_stack = FeatureSet([feature_1, feature_2, feature_3, fc_stack], target_columns=..., df_input=train) logreg = LogisticRegression() summary_logreg = val_stack.score(logreg, fs_stack)
Access the experiment by its ID and get final predictions for test dataframe:
logreg_id = summary_logreg['id'] logreg_exp = lb[logreg_id] # == kts.leaderboard[logreg_id] test_predictions = logreg_exp.predict(test)
Check out the docs for a detailed description of the features of KTS and its best practices of usage.
Command line interface
Use it to create a new project:
$ mkdir project $ cd project $ kts init
or download an example from kts-examples repo:
$ kts example titanic
Core of the project was designed and implemented by the team of Mikhail Andronov, Roman Gorb and Nikita Konodyuk under the mentorship of Alexander Avdyushenko during a project practice held by Yandex and Higher School of Economics on 1-14 February 2019 at Educational Center «Sirius».