README v3.0 / 14 MARCH 2018
QSPR MODELING: APPLICATION OF MACHINE LEARNING ALOGRITHMS IN CLASSIFYING THE FAMILY AND PREDICTING FLASH POINTS AND CETANE NUMBER OF BIOFUEL COMPOUNDS
Introduction
This Biofuel Software will predict the family of the input chemicals and predict thermo-physical properties (flash point and cetane number) according to the family. The GUI is designed by using tkinter. Numerical regression and classification methods, including MLPR, GRNN, OLS, PLS, KNN, SVM, LDA, are used in the machine learning approach to make better predictions of family and properties.
Usage
To predict the family and the thermo-physical properties of the imported molecule, user can run the software following the instructions below.
- Git clone our GitHub address
git clone https://github.com/Zhangjt9317/Biofuel-Group-Project.git
; - Then, users input
cd Biofuel-Group-Project/MyProject
command into bash; - Next, users input
python Project_GUI.py
command to open the Graphic User Interface; - Enter the
CID number
of that chemical and clickGet CID
to comfirm input. ifGet CID
is not clicked, no CID will be gotten for the machine learning models; - Click
Model selection
to chose differient machine learning methods and properties, and then clickBegin
to confirm selection; - Then click
Result
to plot the training and predction result.
Contribution
- Issue Tracker: https://github.com/Zhangjt9317/Biofuel-Group-Project/issues
- Source Code: https://github.com/Zhangjt9317/Biofuel-Group-Project
Requirements
This program runs on python. User must have the following packages installed in local environment.
Packages used in this program include: Openbabel, Neupy, Numpy, Matplotlib, Pandas, Pubchempy, Sklearn, tkinter, xlrd. The address of several packages are as following.
- NeuPy: Neural Networks package in Python.
- Open Babel: Search, convert, analyze, or store data from molecular modeling.
- PubChemPy: Enable chemical searches by CID, name, substructure and conversion between different chemical file formats.
- Pybel: Enables the expression of complex molecular relationships and their context in a machine-readable form
- Tkinter: Standard Python interface to the Tk GUI toolkit
- XLRD: Extract data from Excel spreadsheets
One example
Please see the example for our software on the Demo.ipynb in the example folder.
Credits
Jingtian Zhang, Cheng Zeng, Renlong Zheng, Chenggang Xi
Contact
If you are having issues, please contact Cheng Zeng and Jingtian Zhang by zengcheng95 --At-- gmail.com, jtz9317 --At-- gmail.com.
License
The project is licensed under the MIT license.