Auto-modelling is a convenient library to train and tune machine models automatically.
Its main features include the following:
- preprocessing columns in all datatypes. (numeric, categorical, text)
- train machine models and tune parameters automatically.
- return top n best models with optimized parameters.
- Apply stacking technique to combine the n best models returned by the repo or self-determined fitted models together to get an even better result.
The machine learning models include the following:
- for classify: LogisticRegression
- for regression: LinearRegression
pip install auto-modelling
from auto_modelling.classification import GoClassify from auto_modelling.regression import GoRegress from auto_modelling.preprocess import DataManager from auto_modelling.stack import Stack # preprocessing data dm = DataManager(directory = 'preprocess_tools') train, test = dm.drop_sparse_columns(x_train, x_test) train, test = dm.process_data(x_train, x_test) # the encoders are stored in the directory called data_process_tools. # use the same processing tools to process new data predict_data = dm.process_predict_data(predict_x) # predict_x should have the same format as x_train/x_test # classification clf = GoClassify(n_best=1) best = clf.train(x_train, y_train) y_pred = best.predict(x_test) # regression reg = GoRegress(n_best=1) best = reg.train(x_train, y_train) y_pred = best.predict(x_test) # get top 3 best models clf = GoClassify(n_best=3) bests = clf.train(x_train, y_train) y_preds = [m.predict(x_test) for m in bests] # Stack top 3 best models stack = Stack(n_models = 3) level_0_models, level_1_model = stack.train(x_train, y_train, x_test, y_test)
There are examples
sample.py in the root directory of this package. run
Clone the repo
Create the virtual environment
mkvirtualenv auto workon auto pip install requirements.txt
if you have issues in installing
- TO DO: Feature selection, evaluation metricss
- Ideally, any dataframe being throw into this repo, it should be processed.
- drop column that have too many null(Done)
- fill na for both numeric and non-numeric values(Done)
- encoded for non-numeric values(Done)
- scale values if needed
- balance the dataset if needed
- mode =
- split data-set
- tuning parameters and model selection (Done)
- feature selection
- return a model with parameters, columns and a script to process x_test(Done)
- stacking with customized fitted models (Done)
- mode =