Report | Homepage | BgoFace UI
π€π€π€ Please star βοΈ this project to support open-source development! For questions or collaboration, contact: Dr. Bin Cao (bcao686@connect.hkust-gz.edu.cn)
Bgolearn is a lightweight and extensible Python package for Bayesian global optimization, built for accelerating materials discovery and design. It provides out-of-the-box support for regression and classification tasks, implements various acquisition strategies, and offers a seamless pipeline for virtual screening, active learning, and multi-objective optimization.
π¦ Official PyPI:
pip install Bgolearn
π₯ Code tutorial (BiliBili): Watch here π Colab Demo: Run it online
pip install Bgolearn
pip install --upgrade Bgolearn
pip show Bgolearn
import Bgolearn.BGOsampling as BGOS
import pandas as pd
# Load characterized dataset
data = pd.read_csv('data.csv')
x = data.iloc[:, :-1] # features
y = data.iloc[:, -1] # response
# Load virtual samples
vs = pd.read_csv('virtual_data.csv')
# Instantiate and run model
Bgolearn = BGOS.Bgolearn()
Mymodel = Bgolearn.fit(data_matrix=x, Measured_response=y, virtual_samples=vs)
# Get result using Expected Improvement
Mymodel.EI()
Install the extension toolkit:
pip install BgoKit
from BgoKit import ToolKit
Model = ToolKit.MultiOpt(vs, [score_1, score_2])
Model.BiSearch()
Model.plot_distribution()
π See detailed demo: Multi-objective Example

- Expected Improvement (EI)
- Augmented Expected Improvement (AEI)
- Expected Quantile Improvement (EQI)
- Upper Confidence Bound (UCB)
- Probability of Improvement (PI)
- Predictive Entropy Search (PES)
- Knowledge Gradient (KG)
- Reinterpolation EI (REI)
- Expected Improvement with Plugin
- Least Confidence
- Margin Sampling
- Entropy-based approach
The graphical frontend of Bgolearn is developed as BgoFace, providing no-code access to its backend algorithms.
Supports a broad range of acquisition strategies (EI, UCB, KG, PES, etc.) for both single and multi-objective optimization. Works well with sparse and high-dimensional datasets common in material science.
Use BgoKit and MultiBgolearn to implement Pareto optimization across multiple target properties (e.g., strength & ductility), enabling parallel evaluation across virtual samples.
Incorporates adaptive sampling in an active learning loopβexperiment β prediction β updateβto accelerate optimization using fewer experiments.
-
Nano Letters: Self-Driving Laboratory under UHV Link
-
Small: ML-Engineered Nanozyme System for Anti-Tumor Therapy Link
-
Computational Materials Science: Mg-Ca-Zn Alloy Optimization Link
-
Measurement: Foaming Agent Optimization in EPB Shield Construction Link
-
Intelligent Computing: Metasurface Design via Bayesian Learning Link
-
Materials & Design: Lead-Free Solder Alloys via Active Learning Link
-
npj Computational Materials: MLMD Platform with Bgolearn Backend Link
Released under the MIT License. πΌ Free for academic and commercial use. Please cite relevant publications if used in research.
We welcome community contributions and research collaborations:
- Submit issues for bug reports, ideas, or suggestions
- Submit pull requests for code contributions
- Contact Bin Cao (bcao686@connect.hkust-gz.edu.cn) for collaborations
Signature:
Bgolearn.fit(
data_matrix,
Measured_response,
virtual_samples,
Mission='Regression',
Classifier='GaussianProcess',
noise_std=None,
Kriging_model=None,
opt_num=1,
min_search=True,
CV_test=False,
Dynamic_W=False,
seed=42,
)
================================================================
:param data_matrix: data matrix of training dataset, X .
:param Measured_response: response of tarining dataset, y.
:param virtual_samples: designed virtual samples.
:param Mission: str, default 'Regression', the mission of optimization. Mission = 'Regression' or 'Classification'
:param Classifier: if Mission == 'Classification', classifier is used.
if user isn't applied one, Bgolearn will call a pre-set classifier.
default, Classifier = 'GaussianProcess', i.e., Gaussian Process Classifier.
five different classifiers are pre-setd in Bgolearn:
'GaussianProcess' --> Gaussian Process Classifier (default)
'LogisticRegression' --> Logistic Regression
'NaiveBayes' --> Naive Bayes Classifier
'SVM' --> Support Vector Machine Classifier
'RandomForest' --> Random Forest Classifier
:param noise_std: float or ndarray of shape (n_samples,), default=None
Value added to the diagonal of the kernel matrix during fitting.
This can prevent a potential numerical issue during fitting, by
ensuring that the calculated values form a positive definite matrix.
It can also be interpreted as the variance of additional Gaussian.
measurement noise on the training observations.
if noise_std is not None, a noise value will be estimated by maximum likelihood
on training dataset.
:param Kriging_model (default None):
str, Kriging_model = 'SVM', 'RF', 'AdaB', 'MLP'
The machine learning models will be implemented: Support Vector Machine (SVM),
Random Forest(RF), AdaBoost(AdaB), and Multi-Layer Perceptron (MLP).
The estimation uncertainity will be determined by Boostsrap sampling.
or
a user defined callable Kriging model, has an attribute of <fit_pre>
if user isn't applied one, Bgolearn will call a pre-set Kriging model
atribute <fit_pre> :
input -> xtrain, ytrain, xtest ;
output -> predicted mean and std of xtest
e.g. (take GaussianProcessRegressor in sklearn):
class Kriging_model(object):
def fit_pre(self,xtrain,ytrain,xtest):
# instantiated model
kernel = RBF()
mdoel = GaussianProcessRegressor(kernel=kernel).fit(xtrain,ytrain)
# defined the attribute's outputs
mean,std = mdoel.predict(xtest,return_std=True)
return mean,std
e.g. (MultiModels estimations):
class Kriging_model(object):
def fit_pre(self,xtrain,ytrain,xtest):
# instantiated model
pre_1 = SVR(C=10).fit(xtrain,ytrain).predict(xtest) # model_1
pre_2 = SVR(C=50).fit(xtrain,ytrain).predict(xtest) # model_2
pre_3 = SVR(C=80).fit(xtrain,ytrain).predict(xtest) # model_3
model_1 , model_2 , model_3 can be changed to any ML models you desire
# defined the attribute's outputs
stacked_array = np.vstack((pre_1,pre_2,pre_3))
means = np.mean(stacked_array, axis=0)
std = np.sqrt(np.var(stacked_array), axis=0)
return mean, std
:param opt_num: the number of recommended candidates for next iteration, default 1.
:param min_search: default True -> searching the global minimum ;
False -> searching the global maximum.
:param CV_test: 'LOOCV' or an int, default False (pass test)
if CV_test = 'LOOCV', LOOCV will be applied,
elif CV_test = int, e.g., CV_test = 10, 10 folds cross validation will be applied.
:return: 1: array; potential of each candidate. 2: array/float; recommended candidate(s).
File: ~/miniconda3/lib/python3.9/site-packages/Bgolearn/BGOsampling.py
Type: method