automated-ml-pack

This package is designed for swift and automated machine learning practice, catering to both classification and regression tasks. It facilitates model training, grid search application, and the preservation of the best model. Furthermore, it stores and visualizes the best scores attained by other models using commonly employed evaluation metrics.


License
MIT
Install
pip install automated-ml-pack==1.0.4

Documentation

AutomatedMLPack

A Comprehensive Package for Automated Machine Learning

scikit-learn

Project Overview: This package is designed for swift and automated machine learning practice, catering to both classification and regression tasks. It facilitates model training, grid search application, and the preservation of the best model. Furthermore, it stores and visualizes the best scores attained by other models using commonly employed evaluation metrics.

What you will get:

  • Training regression and classification models
    • regression models: RandomForestRegressor, DecisionTreeRegressor, GradientBoostingRegressor, LinearRegression, XGBRegressor,CatBoostRegressor,AdaBoostRegressor
    • classification models: RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, LogisticRegression, XGBClassifier, CatBoostClassifier, AdaBoostClassifier, MLPClassifier, SVC
  • Parameter Tunning Using GridsearchCV
  • Feature Engineering Using Feature-Engine Package
  • Feature Selection: Recursive Feature Elimination (classification), Recursive Feature Addition (classification) and SelectKBest (regression)
  • Visualization of Scores

Installation

AutomatedMLPack requires a Python>=3.11.

Create Environment

conda create -n envname python=3.11 -y
conda activate envname

Install AutomatedMLPack Package

pip install automated-ml-pack

CommandLine Options

USAGE:
run_train_pipeline -[INPUT_FILE] [options]

This tool facilitates the training of multiple machine learning models, optimizes the models, and saves the trained models. It also conducts model evaluation using diverse methods. Furthermore, the tool is capable of handling both regression and classification tasks. Additional options are described below.

options:
-h, --help show this help message and exit
--input_file INPUT_FILE Path to the input data in CSV/TSV format.
--input_type {csv,tsv} Type of input file format (csv or tsv)
--training_type {clf,reg} Type of training (e.g., "classification", "regression")
--target_column TARGET_COLUMN Name of the target column in the input dataframe
--engineer_new_features Flag to perform engineering of new features or not.
--output_base OUTPUT_BASE Base Name for most output files.
--test_size TEST_SIZE What fraction of the dataset should be used for testing. Normally, cross validation is performed on the other percentage of the data to access the model' generalization.
--standard_scaling Whether or not to apply scikit-learn standard scaler on the data.
--feature_selection Whether or not to perform feature selection on the dataset.
--feature_selection_method {addition,elimination}
Specify between recursive feature addition and recursive feature elimination algorithms for classification. By default, recursive feature addition is applied. For regression tasks, SelectKBest is used for feature selection.
--selectkbest_num_features SELECTKBEST_NUM_FEATURES Number of top features to select. For regression only.
--output_dir OUTPUT_DIR
Custom Name of Output Folder.
--return_data Select to include raw data, training data and test data in the output folders.
--no_param_finetune If true, hyperparameter search will not be performed for each model. Otherwise, hyperparameter tunning is performed.

Tutorials

The data must be in csv or tsv format and the user must provide the column name that contains the targets. The following is an example of how the tool can be used for classification tasks.

run_train_pipeline --input_file heart.csv --target_column HeartDisease --training_type clf --test_size 0.2 --feature_selection --feature_selection_method addition --output_dir heart_disease_classification

This script will take some time to run. The outputs will be stored in the provided output directory.

outputs cross-validation-score

License

MIT
Free Software, Hell Yeah!

Contributors


Cyrille M. NJUME

Want to contribute? Great!
The projects does not cover the following for now:

  • Multi-class Classification
  • Improved Analysis of Results
  • More Flexibility

References

[1] krishnaik06: https://github.com/krishnaik06/mlproject

Contact

For any feedback or queries, please reach out to cyrillemesue@gmail.com.