DVHA-Machine-Learning-Core
Core library of machine learning visualizations developed for DVH Analytics, adapted for generic use. Under active development.
Simply provide X and y arrays and get a GUI for data exploration and visualization. This application handles data splitting and modeling for you. Algorithms based on scikit-learn.
The code is built upon these core libraries:
- DVH Analytics - A DICOM Database Application for Radiation Oncology
- wxPython Phoenix - Build a native GUI on Windows, Mac, or Unix systems
- Bokeh - Interactive Web Plotting for Python
- scikit-learn - Machine Learning in Python
To Run
NOTE: This application supports python >= 3.5
Either clone this project or install from PyPI:
pip install dvha-mlc
From a python3 console:
from dvhamlc.gui import Model
ModelApp(X, y)
Optionally specify an algorithm:
ModelApp(X, y, algorithm='random_forest', predictive_type='regression')
class ModelApp(X, y, algorithm='random_forest', predictive_type='regression', y_variable='Dependent Variable', y_categories=None, x_variables=None)
-
Parameters
- X : array-like, shape (n_samples, n_features)
- y : array-like, shape (n_samples)
-
algorithm : string, optional (default=’random_forest’)
- Defines which machine learning algorithm to use. Currently supported values (and their sources) include:
- random_forest
- sklearn.ensemble.RandomForestRegressor
- sklearn.ensemble.RandomForestClassifier
- gradient_boosting
- sklearn.ensemble.GradientBoostingRegressor
- sklearn.ensemble.GradientBoostingClassifier
- support_vector_machine
- sklearn.svm.SVR
- sklearn.svm.SVC
- decision_tree
- sklearn.tree.DecisionTreeRegressor
- sklearn.tree.DecisionTreeClassifier
- random_forest
- Defines which machine learning algorithm to use. Currently supported values (and their sources) include:
-
predictive_type : string, optional (default='regression')
- May be either 'regression' or 'classification'
-
y_variable : string, optional (default='Dependent Variable')
- Visual attribute only, sets the y-axis title
-
y_categories : array-like, shape (n_samples), optional (default=None)
- Only applicable when predictive_type='classification'. This array maps the numerical values fed into the machine learning algorithm to labels, these values will appear in the hover messages upon plot inspection.
-
x_variables : array-like, shape (n_features), optional (default=None)
- The feature importance plot uses this data to display variable names instead column indices
Dependencies
- Python >=3.5
- wxPython Phoenix >= 4.0.4
- Bokeh >= 1.2.0
- NumPy 1.16.4 tested
- Scikit-learn 0.21.2 tested
Visual Options
See options.py for visual customization of some sizes and colors
Example
Based on a Random Forest Classifier example by Chris Albon.
# Import the DVHA Machine Learning Core application
from dvhamlc.gui import ModelApp
# Load the library with the iris dataset
from sklearn.datasets import load_iris
# Load pandas and numpy
import pandas as pd
import numpy as np
# Set random seed
np.random.seed(0)
# Create an object called iris with the iris data
iris = load_iris()
# Create a dataframe with the four feature variables
df = pd.DataFrame(iris.data, columns=iris.feature_names)
# Get column of the species names, this is what we are going to try to predict
y = pd.Categorical.from_codes(iris.target, iris.target_names)
# Launch the DVHA Machine Learning Core application
ModelApp(df, y, predictive_type='classification')
TODO
- Design a view for cross-validation (i.e., load a model with new data, no new modeling)
- Store a history of models for comparison
- Methods for hyper-parameter grid search
- Add more analysis tools (e.g., confusion matrix)