SMPyBandits
OpenSource Python package for Single and MultiPlayers multiarmed Bandits algorithms.
This repository contains the code of Lilian Besson's numerical environment, written in Python (2 or 3), for numerical simulations on
A complete Sphinxgenerated documentation is on SMPyBandits.GitHub.io.
Quick presentation
It contains the most complete collection of singleplayer (classical) bandit algorithms on the Internet (over 65!), as well as implementation of all the stateoftheart multiplayer algorithms.
I follow very actively the latest publications related to MultiArmed Bandits (MAB) research, and usually implement quite quickly the new algorithms (see for instance, Exp3++, CORRAL and SparseUCB were each introduced by articles (for Exp3++, for CORRAL, for SparseUCB) presented at COLT in July 2017, LearnExp comes from a NIPS 2017 paper, and klUCB++ from an ALT 2017 paper.). More recent examples are klUCBswitch from a paper from May 2018, and also MusicalChairNoSensing from a paper from August 2018.
 Classical MAB have a lot of applications, from clinical trials, A/B testing, game tree exploration, and online content recommendation (my framework does not implement contextual bandit  yet).

Multiplayer MAB have applications in Cognitive Radio, and my framework implements all the collision models found in the literature, as well as all the algorithms from the last 10 years or so (
rhoRand
from 2009,MEGA
from 2015,MusicalChair
, and our stateoftheart algorithmsRandTopM
andMCTopM
, along with very recent algorithmsSICMMAB
from arXiv:1809.08151 andMusicalChairNoSensing
from arXiv:1808.08416).  I'm working on adding a clean support for nonstationary MAB problem, and I will soon implement all stateoftheart algorithms for these problems.
With this numerical framework, simulations can run on a single CPU or a multicore machine, and summary plots are automatically saved as highquality PNG, PDF and EPS (ready for being used in research article).
Making new simulations is very easy, one only needs to write a configuration script and basically no code! See these examples (files named configuration_*.py
).
A complete Sphinx documentation for each algorithms and every piece of code (included constants in the configurations!) is available here: SMPyBandits.GitHub.io. (I will use ReadTheDocs for this project, but I won't use any continuous integration, don't even think of it!)
I (Lilian Besson) have started my PhD in October 2016, and this is a part of my on going research since December 2016.
I launched the documentation on March 2017, I wrote my first research articles using this framework in 2017 and decided to (finally) opensource my project in February 2018. / : /
How to cite this work?
If you use this package for your own work, please consider citing it with this piece of BibTeX:
@misc{SMPyBandits,
title = {{SMPyBandits: an OpenSource Research Framework for Single and MultiPlayers MultiArms Bandits (MAB) Algorithms in Python}},
author = {Lilian Besson},
year = {2018},
url = {https://github.com/SMPyBandits/SMPyBandits/},
howpublished = {Online at: \url{github.com/SMPyBandits/SMPyBandits}},
note = {Code at https://github.com/SMPyBandits/SMPyBandits/, documentation at https://smpybandits.github.io/}
}
I also wrote a small paper to present SMPyBandits, and I will send it to JMLR MLOSS. The paper can be consulted here on my website.
A DOI will arrive as soon as possible! I tried to publish a paper on both JOSS and MLOSS.
List of research publications using SMPyBandits
policy aggregation algorithm (aka model selection)
1st article, aboutI designed and added the Aggregator
policy, in order to test its validity and performance.
It is a "simple" voting algorithm to combine multiple bandit algorithms into one.
Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms A_1 .. A_N
, each running in "parallel".
For more details, refer to this file:
Aggregation.md
and this research article.
PDF : BKM_IEEEWCNC_2018.pdf  HAL notice : BKM_IEEEWCNC_2018  BibTeX : BKM_IEEEWCNC_2018.bib  Source code and documentation
Multiplayers MultiArmed Bandits
2nd article, aboutThere is another point of view: instead of comparing different singleplayer policies on the same problem, we can make them play against each other, in a multiplayer setting.
The basic difference is about collisions : at each time t
, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.
For more details, refer to this file:
MultiPlayers.md
and this research article.
PDF : BK__ALT_2018.pdf  HAL notice : BK__ALT_2018  BibTeX : BK__ALT_2018.bib  Source code and documentation
Doubling Trick for MultiArmed Bandits
3rd article, usingI studied what Doubling Trick can and can't do to obtain efficient anytime version of nonanytime optimal MultiArmed Bandits algorithms.
For more details, refer to this file:
DoublingTrick.md
and this research article.
PDF : BK__DoublingTricks_2018.pdf  HAL notice : BK__DoublingTricks_2018  BibTeX : BK__DoublingTricks_2018.bib  Source code and documentation
PieceWise Stationary MultiArmed Bandits
4th article, aboutWith Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for subBernoulli distributions, and proposed the BGLRT algorithm for changepoint detection for piecewise stationary onearmed bandit problems. We combined the BGLRT with the klUCB multiarmed bandit algorithm and proposed the GLRklUCB algorithm for piecewise stationary multiarmed bandit problems. We prove finitetime guarantees for the BGLRT and the GLRklUCB algorithm, and we illustrate its performance with extensive numerical experiments.
For more details, refer to this file:
NonStationaryBandits.md
and this research article.
PDF : BK__COLT_2019.pdf  HAL notice : BK__COLT_2019  BibTeX : BK__COLT_2019.bib  Source code and documentation
Other interesting things
Singleplayer Policies
 More than 65 algorithms, including all known variants of the
UCB
, klUCB,MOSS
and Thompson Sampling algorithms, as well as other less known algorithms (OCUCB
,BESA
,OSSB
etc).  For instance,
SparseWrapper
is a generalization of the SparseUCB from this article.  Implementation of very recent MultiArmed Bandits algorithms, e.g.,
klUCB++
(from this article),UCBdagger
(from this article), orMOSSanytime
(from this article).  Experimental policies:
BlackBoxOpt
orUnsupervisedLearning
(using Gaussian processes to learn the arms distributions).
Arms and problems
 My framework mainly targets stochastic bandits, with arms following
Bernoulli
, bounded (truncated) or unboundedGaussian
,Exponential
,Gamma
orPoisson
distributions, and more.  The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use
MAB.MAB
), but there is also a perfect support for "Bayesian" problems where the mean vector Âµ1,â€¦,ÂµK change at every repetition (seeMAB.DynamicMAB
).  There is also a good support for Markovian problems, see
MAB.MarkovianMAB
, even though I didn't implement any policies tailored for Markovian problems.  I'm actively working on adding a very clean support for nonstationary MAB problems, and
MAB.PieceWiseStationaryMAB
is already working well. Use it with policies designed for piecewise stationary problems, like DiscountedThompson, one of the CDUCB algorithms, MUCB, SlidingWindowUCB or DiscountedUCB, or SWUCB#.
Other remarks
 Everything here is done in an imperative, object oriented style. The API of the Arms, Policy and MultiPlayersPolicy classes is documented in this file (
API.md
).  The code is clean, valid for both Python 2 and Python 3.
 Some piece of code come from the pymaBandits project, but most of them were refactored. Thanks to the initial project!

G.Varoquaux's joblib is used for the
Evaluator
andEvaluatorMultiPlayers
classes, so the simulations are easily parallelized on multicore machines. (Putn_jobs = 1
orPARALLEL = True
in the config file to use all your CPU cores, as it is by default).
How to run the experiments ?
See this document:
How_to_run_the_code.md
for more details (or this documentation page).
TL;DR: this short bash snippet shows how to clone the code, install the requirements for Python 3 (in a virtualenv, and starts some simulation for N=100 repetitions of the default nonBayesian Bernoullidistributed problem, for K=9 arms, an horizon of T=10000 and on 4 CPUs (it should take about 20 minutes for each simulations):
cd /tmp/ # or wherever you want
git clone c core.symlinks=true https://GitHub.com/SMPyBandits/SMPyBandits.git
cd SMPyBandits
# just be sure you have the latest virtualenv from Python 3
sudo pip3 install upgrade forcereinstall virtualenv
# create and active the virtualenv
virtualenv venv
. venv/bin/activate
type pip # check it is /tmp/SMPyBandits/venv/bin/pip
type python # check it is /tmp/SMPyBandits/venv/bin/python
# install the requirements in the virtualenv
pip install r requirements_full.txt
# run a singleplayer simulation!
N=100 T=10000 K=9 N_JOBS=4 make single
# run a multiplayer simulation!
N=100 T=10000 M=3 K=9 N_JOBS=4 make more
You can also install it directly with pip
and from GitHub:
cd /tmp/ ; mkdir SMPyBandits ; cd SMPyBandits/
virtualenv venv
. venv/bin/activate
type pip # check it is /tmp/SMPyBandits/venv/bin/pip
type python # check it is /tmp/SMPyBandits/venv/bin/python
pip install git+https://github.com/SMPyBandits/SMPyBandits.git#egg=SMPyBandits[full]
 If speed matters to you and you want to use algorithms based on klUCB, you should take the time to build and install the fast C implementation of the utilities KL functions. Default is to use kullback.py, but using the C version from Policies/C/ really speeds up the computations. Just follow the instructions, it should work well (you need
gcc
to be installed). And if speed matters, be sure that you have a working version of Numba, it is used by many small functions to (try to automatically) speed up the computations.
ðŸ’¥ Warning
 This work is still experimental even if it is well tested and stable! It's active research. It should be completely bug free and every single module/file should work perfectly (as this pylint log and this other one says), but bugs are sometimes hard to spot so if you encounter any issue, please fill a bug ticket.
 Whenever I add a new feature, I run experiments to check that nothing is broken (and Travis CI helps too). But there is no unittest (I don't have time). You would have to trust me
ðŸ˜Ž !  This project is NOT meant to be a library that you can use elsewhere, but a research tool.
Contributing?
I don't except issues or pull requests on this project, but you are welcome to.
Contributions (issues, questions, pull requests) are of course welcome, but this project is and will stay a personal environment designed for quick research experiments, and will never try to be an industryready module for applications of MultiArmed Bandits algorithms. If you want to contribute, please have a look to the CONTRIBUTING.md file, and if you want to be more seriously involved, read the CODE_OF_CONDUCT.md file.
 You are welcome to submit an issue, if it was not previously answered,
 If you have interesting example of use of SMPyBandits, please share it! (Jupyter Notebooks are preferred). And fill a pull request to add it to the notebooks examples.
ðŸ’¥ TODO
See this file
TODO.md
, and the issues on GitHub.
ðŸ“œ License ?
MIT Licensed (file LICENSE).
Â© 20162018 Lilian Besson, with help from contributors.