tsnmf

Implementation of Topic-Supervised Non-Negative Matrix Factorization


Keywords
tsnmf, nmf, sparse
Licenses
BSD-3-Clause/Zed
Install
pip install tsnmf==1.0.4

Documentation

tsnmf-sparse

This repository contains an implementation of Topic-Supervised Non-Negative Matrix Factorization (TS-NMF) [1] with Sparse Matrices in Python, using a Scikit-Learn's compatible API.

How it Works

From [1]: Suppose that one supervises k << n documents and identifies l << t topics that were contained in a subset of the documents. One can supervise the NMF method using this information, represented by an n×d topic supervision matrix L.The elements of L contrain the importance weights of matrix W and are of the following form:

Then, for a term-document matrix V and supervision matrix L, TS-NMF seeks matrices W and H that minimize

Where ○ represent the Hadamard (element-wise) product operator.

Installation

You can install TS-NMF via pip:

pip install tsnmf

Or clonning this repository and running setup.py:

python setup.py install

Usage

TS-NMF is used in a similar way as the module decomposition.NMF from Scikit-Learn. The extra thing that you need is a list of list that contains the labels to build the matrix L.

Suppose you want to get 3 topics from 5 documents. The 5 documents should be represented in a matrix V, the most used way is apply a TF-IDF Vectorizer, which reflect how important a word is to a document.

Each element of the list of list of labels correspond to a document. These elements contain a list of topics that contrain the document. For example

labels = [[],
          [0,2], # document 1
          [],
          [],
          [1]] # document 4

means that the document 1 is contrained to be topic 0 or 2 and document 4 to be topic 1. For the other documents all the topics are permitted.

Finally, to run TS-NMF:

from tsnmf import TSNMF

tsnmf = TSNMF(n_components=3, random_state=1)
W = tsnmf.fit_transform(V, labels=labels)
H = tsnmf.components_

Credits

  • Developed mainly by Victor Navarro (@vokturz), under the guidance of Eduardo Graells-Garrido (@carnby), in the context of CONICYT Fondo de Fomento al Desarrollo Científico y Tecnológico (FONDECYT) Proyecto de Iniciación 11180913.
  • Based on scikit-learn's NMF code and the original ws-nmf.

References

  1. MacMillan, Kelsey, and James D. Wilson. "Topic supervised non-negative matrix factorization." arXiv preprint arXiv:1706.05084 (2017).