mlmodels : Model ZOO for Pytorch, Tensorflow, Keras, Gluon models...

Model ZOO with Lightweight Functional interface to wrap access to Recent and State o Art Deep Learning, ML models and Hyper-Parameter Search, cross platforms such as Tensorflow, Pytorch, Gluon, Keras,...
Logic follows sklearn API: fit, predict, transform, metrics, save, load
Goal is to transform Jupyter code into Semi-Prod code with minimal code change ...
Model list is available here : https://github.com/arita37/mlmodels/blob/dev/README_model_list.md
Why Functional interface instead of OOP ? Just Functional reduces the amount of code needed, focus more on the computing part (vs design part), a bit easier maintenability for medium size project, good for scientific computing process.

#### Docs here:   https://mlmodels.readthedocs.io/en/latest/  (incomplete docs)

######################################################################################

① Installation

Install as editable package (ONLY dev branch)

cd yourfolder
git clone https://github.com/arita37/mlmodels.git mlmodels
cd mlmodels
git checkout dev     
pip install -e .  --no-deps

Dependencies

optuna
tensorflow>=1.14.0
pytorch>=0.4.0
keras>=2.0
gluon
autogluon
gluonts
pandas>=0.24.2
scipy>=1.3.0
numexpr>=2.6.8 
scikit-learn>=0.21.2

#######################################################################################

② How to add a new model

Source code structure as below

docs: documentation
mlmodels: interface wrapper for pytorch, keras, gluon, tf, transformer NLP for train, hyper-params searchi.
- model_xxx: folders for each platform with same interface defined in template folder
- dataset: store dataset files for test runs.
- template: template interface wrapper which define common interfaces for whole platforms
- ztest: testing output for each sample testing in model_xxx
ztest: testing output for each sample testing in model_xxx

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

Declare below classes/functions in the created file:

Class Model()                                                 :   Model definition
      __init__(model_pars)                                    :   
                            
def fit(model, data_pars, model_pars, compute_pars, )         : Train the model
def predict(model, sess, data_pars, compute_pars, out_pars )  : Predict the results
def metric(ytrue, ypred, yproba, data_pars, compute_pars, out_pars )         : Measure the results

def get_params()                                              : returnparameters of the model
def get_dataset(data_pars)                                    : load dataset
def test()                                                    : example running the model     
def test2()                                                   : example running the model in global settings  

def save(model, path)                                         : save the model
def load(path)                                                : load the trained model

Infos

model :         Model(model_pars), instance of Model() object
sess  :         Session for TF model.
model_pars :    dict containing info on model definition.
data_pars :     dict containing info on input data.
compute_pars :  dict containing info on model compute.
out_pars :      dict containing info on output folder.

2. Write your code and create test() to test your code. **

Declare model definition in Class Model()

    self.model = DeepFM(linear_cols, dnn_cols, task=compute_pars['task']) # mlmodels/model_kera/01_deectr.py
    # Model Parameters such as `linear_cols, dnn_cols` is obtained from function `get_params` which return `model_pars, data_pars, compute_pars, out_pars`

Implement pre-process data in function get_dataset which return data for both training and testing dataset Depend on type of dataset, we could separate function with datatype as below example

    if data_type == "criteo":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_criteo(df, **kw)

    elif data_type == "movie_len":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_movielens(df, **kw)

Call fit/predict with initialized model and dataset

    # get dataset using function get_dataset
    data, linear_cols, dnn_cols, train, test, target = get_dataset(**data_pars)
    # fit data
     model.model.fit(train_model_input, train[target].values,
                        batch_size=m['batch_size'], epochs=m['epochs'], verbose=2,
                        validation_split=m['validation_split'], )
    # predict data
    pred_ans = model.model.predict(test_model_input, batch_size= compute_pars['batch_size'])

Calculate metric with predict output

    # input of metrics is predicted output and ground truth data
    def metrics(ypred, ytrue, data_pars, compute_pars=None, out_pars=None, **kwargs):

Example https://github.com/arita37/mlmodels/tree/dev/mlmodels/template https://github.com/arita37/mlmodels/blob/dev/mlmodels/model_gluon/gluon_deepar.py https://github.com/arita37/mlmodels/blob/dev/mlmodels/model_gluon/gluon_deepar.json

3. Create JSON config file inside /model_XXX/mymodel.json **

Separate configure for staging development environment such as testing and production phase then for each staging, declare some specific parameters for model, dataset and also output
Example

    {
        "test": {
            "model_pars": {
                "learning_rate": 0.001,
                "num_layers": 1,
                "size": 6,
                "size_layer": 128,
                "output_size": 6,
                "timestep": 4,
                "epoch": 2
            },
            "data_pars": {
                "data_path": "dataset/GOOG-year.csv",
                "data_type": "pandas",
                "size": [0, 0, 6],
                "output_size": [0, 6]
            },
            "compute_pars": {
                "distributed": "mpi",
                "epoch": 10
            },
            "out_pars": {
                "out_path": "dataset/",
                "data_type": "pandas",
                "size": [0, 0, 6],
                "output_size": [0, 6]
            }
        },
    
        "prod": {
            "model_pars": {},
            "data_pars": {}
        }
    }

#######################################################################################

③ CLI tools: package provide below tools

ml_models
ml_optim

How to use tools

Lightweight Functional interface to execute models ml_models : mlmodels/models.py

ml_models --do  
    model_list  :  list all models in the repo                            
    testall     :  test all modules inside model_tf
    test        :  test a certain module inside model_tf
    fit         :  wrap fit generic m    ethod
    predict     :  predict  using a pre-trained model and some data
    generate_config  :  generate config file from code source
    
  ## --do fit  
  --model_uri     model_tf.1_lstm
  --save_folder   myfolder/
  --config_file   myfile.json
  --config_mode   "test"


  ## --do predict  
  --load_folder   mymodel_folder/

Lightweight Functional interface to wrap Hyper-parameter Optimization ml_optim : mlmodels/optim.py

ml_optim --do
    test      :  Test the hyperparameter optimization for a specific model
    test_all  :  TODO, Test all
    search    :  search for the best hyperparameters of a specific model

Lightweight Functional interface to run test samples ml_test

ml_test

Command line tool sample

generate config file

ml_models  --do generate_config  --model_uri model_tf.1_lstm.py  --save_folder "c:\myconfig\"

TensorFlow LSTM model

ml_models  --model_uri model_tf/1_lstm.py  --do test

PyTorch models

ml_models  --model_uri model_tch/mlp.py  --do test

Custom Models

ml_models --do test  --model_uri "D:\_devs\Python01\gitdev\mlmodels\mlmodels\model_tf\1_lstm.py"

Distributed Pytorch on CPU (using Horovod and MPI on Linux, 4 processes) in model_tch/mlp.py

mlmodels/distri_torch_mpirun.sh   4    model_tch.mlp    mymodel.json

Model param search test

ml_optim --do test

For normal optimization search method

ml_optim --do search --ntrials 1  --config_file optim_config.json --optim_method normal
ml_optim --do search --ntrials 1  --config_file optim_config.json --optim_method prune  ###### for pruning method

HyperParam standalone run

ml_optim --modelname model_tf.1_lstm.py  --do test
ml_optim --modelname model_tf.1_lstm.py  --do search

#######################################################################################

④ Interface

models.py

   module_load(model_uri)
   model_create(module)
   fit(model, module, session, data_pars, out_pars   )
   metrics(model, module, session, data_pars, out_pars)
   predict(model, module, session, data_pars, out_pars)
   save(model, path)
   load(model)

optim.py

   optim(modelname="model_tf.1_lstm.py",  model_pars= {}, data_pars = {}, compute_pars={"method": "normal/prune"}
       , save_folder="/mymodel/", log_folder="", ntrials=2) 

   optim_optuna(modelname="model_tf.1_lstm.py", model_pars= {}, data_pars = {}, compute_pars={"method" : "normal/prune"},
                save_folder="/mymodel/", log_folder="", ntrials=2)

Generic parameters

   Define in models_config.json
   model_params      :  Relative to model definition 
   compute_pars      :  Relative to  the compute process
   data_pars         :  Relative to the input data
   out_pars          :  Relative to outout data

Sometimes, data_pars is required to setup the model (ie CNN with image size...)

####################################################################################

⑤ Code sample

from mlmodels.models import module_load, data_loader, create_model, fit, predict, stats
from mlmodels.models import load #Load model weights

#### Training
model_pars   =  {  "num_layers": 1,
                  "size": ncol_input, "size_layer": 128, "output_size": ncol_output, "timestep": 4,
                }
data_pars    =  {}
compute_pars =  { "learning_rate": 0.001, }

module        =  module_load( model_uri="model_tf.1_lstm.py" )  #Load file definition
model         =  model_create(module, model_pars)    # Create Model instance
model, sess   =  fit(model, module, data_pars)       # fit the model
metrics_val   =  metrics( model, sess, ["loss"])     # get stats
model.save( "myfolder/", model, module, sess,)

Inference

model = load(folder)    #Create Model instance
ypred = module.predict(model, module, data_pars, compute_pars)     # predict pipeline

#######################################################################################

⑥ Naming convention

Function naming

pd_   :  input is pandas dataframe
np_   :  input is numpy
sk_   :  inout is related to sklearn (ie sklearn model), input is numpy array
plot_

_col_  :  name for colums
_colcat_  :  name for category columns
_colnum_  :  name for numerical columns (folat)
_coltext_  : name for text data
_colid_  : for unique ID columns\

_stat_ : show statistics
_df_  : dataframe
_num_ : statistics

col_ :  function name for column list related.

Argument Variables naming

df     :  variable name for dataframe
colname  : for list of columns
colexclude
colcat : For category column
colnum :  For numerical columns
coldate : for date columns
coltext : for raw text columns

###############################################################################

⑦ Conda install

conda create -n py36 python=3.6.5  -y
source activate py36
pip install  ipykernel spyder-kernels=0.* -y

mlmodels
Release 0.38.1

Release 0.38.1

0.38.1

0.37.1

0.36.1

0.35.1

0.34.1

0.27.1

0.26.1

0.25.1

0.24.1

0.23.1

Documentation

mlmodels : Model ZOO for Pytorch, Tensorflow, Keras, Gluon models...

① Installation

Dependencies

② How to add a new model

Source code structure as below

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code. **

3. Create JSON config file inside /model_XXX/mymodel.json **

③ CLI tools: package provide below tools

How to use tools

Command line tool sample

generate config file

TensorFlow LSTM model

PyTorch models

Custom Models

Distributed Pytorch on CPU (using Horovod and MPI on Linux, 4 processes) in model_tch/mlp.py

Model param search test

For normal optimization search method

HyperParam standalone run

④ Interface

Generic parameters

⑤ Code sample

Inference

⑥ Naming convention

Function naming

Argument Variables naming

⑦ Conda install

Stats

Releases

Contributors

mlmodels Release 0.38.1

Release 0.38.1 Toggle Dropdown 0.38.1 0.37.1 0.36.1 0.35.1 0.34.1 0.27.1 0.26.1 0.25.1 0.24.1 0.23.1

Documentation

mlmodels : Model ZOO for Pytorch, Tensorflow, Keras, Gluon models...

① Installation

Dependencies

② How to add a new model

Source code structure as below

How to define a custom model

1. Create a file mlmodels\model_XXXX\mymodel.py , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code. **

3. Create JSON config file inside /model_XXX/mymodel.json **

③ CLI tools: package provide below tools

How to use tools

Command line tool sample

generate config file

TensorFlow LSTM model

PyTorch models

Custom Models

Distributed Pytorch on CPU (using Horovod and MPI on Linux, 4 processes) in model_tch/mlp.py

Model param search test

For normal optimization search method

HyperParam standalone run

④ Interface

Generic parameters

⑤ Code sample

Inference

⑥ Naming convention

Function naming

Argument Variables naming

⑦ Conda install

Stats

Releases

Contributors

mlmodels
Release 0.38.1

Release 0.38.1

0.38.1

0.37.1

0.36.1

0.35.1

0.34.1

0.27.1

0.26.1

0.25.1

0.24.1

0.23.1

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....