dvha-mlc

Core library of machine learning visualization developed for DVH Analytics adapted for generic use.


Keywords
dvha, dvh-analytics, dvha-mlc, dvha-machine-learning-core, bokeh, machine-learning, sklearn, wxpython-phoenix-gui
License
BSD-3-Clause
Install
pip install dvha-mlc==0.1.3

Documentation

  fastlane Logo

DVHA-Machine-Learning-Core

DVHA MLC screenshot

Core library of machine learning visualizations developed for DVH Analytics, adapted for generic use. Under active development.

Simply provide X and y arrays and get a GUI for data exploration and visualization. This application handles data splitting and modeling for you. Algorithms based on scikit-learn.

The code is built upon these core libraries:

To Run

NOTE: This application supports python >= 3.5

Either clone this project or install from PyPI:
pip install dvha-mlc

From a python3 console:

from dvhamlc.gui import Model
ModelApp(X, y)

Optionally specify an algorithm:

ModelApp(X, y, algorithm='random_forest', predictive_type='regression')

class ModelApp(X, y, algorithm='random_forest', predictive_type='regression', y_variable='Dependent Variable', y_categories=None, x_variables=None)

  • Parameters
    • X : array-like, shape (n_samples, n_features)
    • y : array-like, shape (n_samples)
    • algorithm : string, optional (default=’random_forest’)
      • Defines which machine learning algorithm to use. Currently supported values (and their sources) include:
        • random_forest
          • sklearn.ensemble.RandomForestRegressor
          • sklearn.ensemble.RandomForestClassifier
        • gradient_boosting
          • sklearn.ensemble.GradientBoostingRegressor
          • sklearn.ensemble.GradientBoostingClassifier
        • support_vector_machine
          • sklearn.svm.SVR
          • sklearn.svm.SVC
        • decision_tree
          • sklearn.tree.DecisionTreeRegressor
          • sklearn.tree.DecisionTreeClassifier
    • predictive_type : string, optional (default='regression')
      • May be either 'regression' or 'classification'
    • y_variable : string, optional (default='Dependent Variable')
      • Visual attribute only, sets the y-axis title
    • y_categories : array-like, shape (n_samples), optional (default=None)
      • Only applicable when predictive_type='classification'. This array maps the numerical values fed into the machine learning algorithm to labels, these values will appear in the hover messages upon plot inspection.
    • x_variables : array-like, shape (n_features), optional (default=None)
      • The feature importance plot uses this data to display variable names instead column indices

Dependencies

Visual Options

See options.py for visual customization of some sizes and colors

Example

Based on a Random Forest Classifier example by Chris Albon.

# Import the DVHA Machine Learning Core application
from dvhamlc.gui import ModelApp

# Load the library with the iris dataset
from sklearn.datasets import load_iris

# Load pandas and numpy
import pandas as pd
import numpy as np

# Set random seed
np.random.seed(0)

# Create an object called iris with the iris data
iris = load_iris()

# Create a dataframe with the four feature variables
df = pd.DataFrame(iris.data, columns=iris.feature_names)
# Get column of the species names, this is what we are going to try to predict
y = pd.Categorical.from_codes(iris.target, iris.target_names)

# Launch the DVHA Machine Learning Core application
ModelApp(df, y, predictive_type='classification')

TODO

  • Design a view for cross-validation (i.e., load a model with new data, no new modeling)
  • Store a history of models for comparison
  • Methods for hyper-parameter grid search
  • Add more analysis tools (e.g., confusion matrix)