Bayesian Factorization Methods


Keywords
bayesian, factorization, machine-learning, high-dimensional, side-information, bayesian-inference, gibbs-sampling, latent-fact-model, latent-features, matrix-factorization, probabilistic-matrix-factorization, python
License
MIT
Install
pip install smurff==0.16.2

Documentation

SMURFF - Scalable Matrix Factorization Framework

Azure Build Status Travis Build Status Anaconda-Server Badge

What is Bayesian Matrix Factorization

Matrix factorization is a common machine learning technique for recommender systems, like books for Amazon or movies for Netflix.

Matrix Factorizaion

The idea of these methods is to approximate the user-movie rating matrix R as a product of two low-rank matrices U and V such that R ≈ U × V . In this way U and V are constructed from the known ratings in R, which is usually very sparsely filled. The recommendations can be made from the approximation U × V which is dense. If M × N is the dimension of R then U and V will have dimensions M × K and N × K.

Bayesian probabilistic matrix factorization (BPMF) has been proven to be more robust to data-overfitting compared to non-Bayesian matrix factorization.

What is SMURFF

SMURFF is a highly optimized and parallelized framework for Bayesian Matrix and Tensors Factorization. SMURFF supports multiple matrix factorization methods:

  • BPMF, the basic version;
  • Macau, adding support for high-dimensional side information to the factorization;
  • GFA, doing Group Factor Anaysis.

Macau and BPMF can also perform tensor factorization.

Examples

Documentation is generated from Jupyter Notebooks. You can find the notebooks in docs/notebooks and the resulting documentation on smurff.readthedocs.io

Installation

Using conda:

conda install -c vanderaa smurff

Compile from source code: see INSTALL.rst

Contributors

  • Jaak Simm (Macau C++ version, Cython wrapper, Macau MPI version, Tensor factorization)
  • Tom Vander Aa (OpenMP optimized BPMF, Matrix Cofactorization and GFA, Code Reorg)
  • Adam Arany (Probit noise model)
  • Tom Haber (Original BPMF code)
  • Andrei Gedich
  • Ilya Pasechnikov
  • Thanh Le Van (sythetic out-of-matrix prediction example)
  • Xiangju Qin (BPMF using posterior propagation)

Citing SMURFF

If you are using SMURFF in a scientific publication, please cite the following preprint plus the paper describing the corresponding algorithm:

SMURFF: a High-Performance Framework for Matrix Factorization arXiv preprint arXiv:1904:02514

When using pure Bayesian Probabilistic Matrix Factorization, please also cite:

Salakhutdinov R, Mnih A. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning (ICML '08), 2008. ACM, New York, NY, USA, 880-887.

When using Bayesian Factorization with Side Information, please also cite:

Simm J, Arany Á, Zakeri P, Haber T, Wegner JK, Chupakhin V, Ceulemans H, Moreau Y. Macau: Scalable Bayesian Factorization with High-Dimensional Side Information Using MCMC Proc. of the Machine Learning for Signal Processing (MLSP), 2017 IEEE 27th International Workshop on MLSP; 2017; Vol. 2017-September; pp. 1 - 6. Tokyo, Japan.

When using Group Factor Analysis, please also cite:

Klami A, Virtanen S, Leppäaho E, Kaski S., "Group Factor Analysis," in IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2136-2147, Sept. 2015.

Acknowledgements

Over the course of the last 5 years, this work has been supported by the EU H2020 FET-HPC projects EPEEC (contract #801051), ExCAPE (contract #671555) and EXA2CT (contract #610741), and the Flemish Exaptation project.