mlmachine

"mlmachine is a Python library that organizes and accelerates notebook-based machine learning experiments."

Novel Functionality
Example Notebooks
Articles on Medium
Installation
Feedback
Acknowledgments

Novel Functionality

Easy, Elegant EDA

mlmachine creates beautiful and informative EDA panels with ease:

# create EDA panel for all "category" features
for feature in mlmachine_titanic.data.mlm_dtypes["category"]:
    mlmachine_titanic.eda_cat_target_cat_feat(
        feature=feature,
        legend_labels=["Died","Survived"],
    )

Pandas-in / Pandas-out Pipelines

mlmachine makes Scikit-learn transformers Pandas-friendly.

Here's an example. See how simply wrapping the mlmachine utility PandasTransformer() around OneHotEncoder() maintains our DataFrame:

KFold Target Encoding

mlmachine includes a utility called KFoldEncoder, which applies target encoding on categorical features and leverages out-of-fold encoding to prevent target leakage:

# perform 5-fold target encoding with TargetEncoder from the category_encoders library
encoder = KFoldEncoder(
    target=mlmachine_titanic_train.target,
    cv=KFold(n_splits=5, shuffle=True, random_state=0),
    encoder=TargetEncoder,
)
encoder.fit_transform(mlmachine_titanic_train.data[["Pclass"]])

Crowd-sourced Feature Importance & Exhaustive Feature Selection

mlmachine employs a robust approach to estimating feature importance by using a variety of techniques:

Tree-based Feature Importance
Recursive Feature Elimination
Sequential Forward Selection
Sequential Backward Selection
F-value / p-value
Variance
Target Correlation

This occurs with one simple execution, and operates on multiple estimators and/or models, and one or more scoring metrics:

# instantiate custom models
rf2 = RandomForestClassifier(max_depth=2)
rf4 = RandomForestClassifier(max_depth=4)
rf6 = RandomForestClassifier(max_depth=6)

# estimator list - default XGBClassifier, default
# RandomForestClassifier and three custom models
estimators = [
    XGBClassifier,
    RandomForestClassifier,
    rf2,
    rf4,
    rf6,
]

# instantiate FeatureSelector object
fs = mlmachine_titanic_train.FeatureSelector(
    data=mlmachine_titanic_train.data,
    target=mlmachine_titanic_train.target,
    estimators=estimators,
)

# run feature importance techniques, use ROC AUC and
# accuracy score metrics and 0 CV folds (where applicable)
feature_selector_summary = fs.feature_selector_suite(
    sequential_scoring=["roc_auc","accuracy_score"],
    sequential_n_folds=0,
    save_to_csv=True,
)

Then the features are winnowed away, from least important to most important, through an exhaustive cross-validation procedure in search of an optimum feature subset:

Hyperparameter Tuning with Bayesian Optimization

mlmachine can perform Bayesian optimization on multiple estimators in one shot, and includes functionality for visualizing model performance and parameter selections:

# generate parameter selection panels for each parameter
mlmachine_titanic_train.model_param_plot(
        bayes_optim_summary=bayes_optim_summary,
        estimator_class="KNeighborsClassifier",
        estimator_parameter_space=estimator_parameter_space,
        n_iter=100,
    )

Example Notebooks

All examples can be viewed here

Example Notebook 1 - Learn the basics of mlmachine, how to create EDA panels, and how to execute Pandas-friendly Scikit-learn transformations and pipelines.

Example Notebook 2 - Learn how use mlmachine to assess a datasets pre-processing needs. See examples of how to use novel functionality, such as GroupbyImputer(), KFoldEncoder() and DualTransformer().

Example Notebook 3 - Learn how to perform thorough feature importance estimation, followed by an exhaustive, cross-validation-driven feature selection process.

Example Notebook 4 - Learn how to execute hyperparameter tuning with Bayesian optimization for multiple model and multiple parameter spaces in one simple execution.

Articles on Medium

mlmachine - Clean ML Experiments, Elegant EDA & Pandas Pipelines - Published 4/3/2020

mlmachine - GroupbyImputer, KFoldEncoder, and Skew Correction - Published 4/13/2020

Installation

Python Requirements: 3.6, 3.7

mlmachine uses the latest, or almost latest, versions of all dependencies. Therefore, it is highly recommended that mlmachine is installed in a virtual environment.

pyenv

Create a new virtual environment:

$ pyenv virtualenv 3.7.5 mlmachine-env

Activate your new virtual environment:

$ pyenv activate mlmachine-env

Install mlmachine using pip to install mlmachine and all dependencies:

$ pip install mlmachine

anaconda

Create a new virtual environment:

$ conda create --name mlmachine-env python=3.7

Activate your new virtual environment:

$ conda activate mlmachine-env

Install mlmachine using pip to install mlmachine and all dependencies:

$ pip install mlachine

Feedback

Any and all feedback is welcome. Please send me an email at petersontylerd@gmail.com

Acknowledgments

mlmachine stands on the shoulders of many great Python packages:

mlmachine
Release 0.1.5

Release 0.1.5

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

0.0.34

0.0.33

0.0.32

0.0.31

Documentation

mlmachine

Table of Contents

Novel Functionality

Example Notebooks

Articles on Medium

Installation

Feedback

Acknowledgments

Stats

Development practices

Releases

Contributors

mlmachine Release 0.1.5

Release 0.1.5 Toggle Dropdown 0.1.5 0.1.4 0.1.3 0.1.2 0.1.1 0.1.0 0.0.34 0.0.33 0.0.32 0.0.31

Documentation

mlmachine

Table of Contents

Novel Functionality

Example Notebooks

Articles on Medium

Installation

Feedback

Acknowledgments

Stats

Development practices

Releases

Contributors

mlmachine
Release 0.1.5

Release 0.1.5

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

0.0.34

0.0.33

0.0.32

0.0.31