🤝🤝🤝 Please star ⭐️ it for promoting open source projects 🌍 ! Thanks !
- (Patent) Zhang Tong-yi, Cao Bin, Yuan Hao, Wei Qinghua, Dong Ziqiang. Tree-Classifier for Gaussian Process Regression. CN 115017977 A (2022.09.06), GitHub : github.com/Bin-Cao/TCGPR.
- https://www.shu.edu.cn/info/1055/321555.htm
- https://mp.weixin.qq.com/s/aZ9bZY1z_Bcd6Uepu7D3Mg
- https://mp.weixin.qq.com/s/ssj5kP-RfIpKRS6QjcLHTQ
- user count
Tree-Classifier for Gaussian process regression
Written using Python, which is suitable for operating systems, e.g., Windows/Linux/MAC OS etc.
See Introduction
pip install PyTcgpr
pip show PyTcgpr
pip install --upgrade PyTcgpr
PyTcgpr/
__init__.py
TCGPR.py
data/
OutliersIdentification.py
DatasetPartition.py
feature/
FeaturesSelection.py
- Data Screening module | Partition
from PyTcgpr import TCGPR
dataSet = "data.csv"
initial_set_cap = 3
sampling_cap =2
up_search = 500
CV = 'LOOCV'
Task = 'Partition'
TCGPR.fit(
filePath = dataSet, initial_set_cap = initial_set_cap,Task=Task, sampling_cap = sampling_cap,
up_search = up_search, CV=CV
)
# note: default setting of Mission = 'DATA', No need to declare
- Data Screening module | Identification
from PyTcgpr import TCGPR
dataSet = "data.csv"
sampling_cap =2
up_search = 500
Task = 'Identification'
CV = 'LOOCV'
TCGPR.fit(
filePath = dataSet, Task = Task, sampling_cap = sampling_cap,
up_search = up_search,CV=CV
)
# note: default setting of Mission = 'DATA', No need to declare; initial_set_cap is masked
- Feature Selection module
from PyTcgpr import TCGPR
dataSet = "data.csv"
sampling_cap =2
Mission = 'FEATURE'
up_search = 500
CV = 'LOOCV'
TCGPR.fit(
filePath = dataSet, Mission = 'FEATURE', initial_set_cap = initial_set_cap, sampling_cap = sampling_cap,
up_search = up_search,CV=CV
)
# note: for feature selection, Mission should be declared as Mission = 'FEATURE' !
:param Mission : str, the mission of TCGPR,
default Mission = 'DATA' for data screening.
Mission = 'FEATURE' for feature selection.
:param filePath: the input dataset in csv format
:param initial_set_cap:
for Mission = 'DATA':
if Task = 'Partition':
initial_set_cap : the capacity of the initial dataset
int, default = 3, recommend = 3-10
or a list :
i.e.,
[3,4,8], means the 4-th, 5-th, 9-th datum will be collected as the initial dataset
elif Task = 'Identification':
param initial_set_cap is masked
for Mission = 'FEATURE':
initial_set_cap : the capacity of the initial featureset
int, default = 1, recommend = 1-5
or a list : i.e.,
[3,4,8], means the 4-th, 5-th, 9-th feature will be selected as the initial characterize
:param sampling_cap:
for Mission = 'DATA':
int, the number of data added to the updating dataset at each iteration, default = 1, recommend = 1-5
for Mission = 'FEATURE':
int, the number of features added to the updating feature set at each iteration, default = 1, recommend = 1-3
:param measure :Correlation criteria, default 'Pearson' means R values are used
or measure = 'Determination' means R^2 values are used
:param ratio:
for Mission = 'DATA':
if Task = 'Partition':
tolerance, lower boundary of R is (1-ratio)Rmax, default = 0.2, recommend = 0~0.3
elif Task = 'Identification':
tolerance, lower boundary of R is (1+ratio)R[last], default = 0.0, recommend = -0.01~0.01
for Mission = 'FEATURE':
tolerance, lower boundary of R is (1+ratio)R[last], default = -0.01, recommend = -0.01~0.01
:param target:
used in feature selection when Mission = 'FEATURE'
int, default 1, the number of target in regression mission
target = 1 for single_task regression and =k for k_task regression (Multiobjective regression)
otherwise : param target is masked
:param weight
a weight imposed on R value in calculating GGMF, default = .2 , recommend = .1-1
i.e.,
weight * (1-R) + mean / std (mean, std is the mean and standard deviation of length scales)
:param up_search:
for Mission = 'DATA':
up boundary of candidates for brute force search, default = 5e2 , recommend = 2e2-2e3
for Mission = 'FEATURE':
up boundary of candidates for brute force search, default = 10 , recommend = 10-20
:param exploit_coef: constrains to the magnitude of variance in Cal_EI function, default = 2, recommend = 2
:param exploit_model: boolean, default, False
exploit_model == True, the searching direction will be R only! GGMF will not be used!
:param CV: cross validation, default = 10
e.g. (int) CV = 5,10,... or str CV = 'LOOCV' for leave one out cross validation
- [Dataset remained by TCGPR.csv]
Maintained by Bin Cao. Please feel free to open issues in the Github or contact Bin Cao (bcao686@connect.hkust-gz.edu.cn) in case of any problems/comments/suggestions in using the code.
Contribution and suggestions are always welcome. In addition, we are also looking for research collaborations. You can submit issues for suggestions, questions, bugs, and feature requests, or submit pull requests to contribute directly. You can also contact the authors for research collaboration.