Broca's Language Model
Broca's LM is a free python library providing a probabilistic language model based on a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). It utilizes Gensim's Word2Vec implementation to transform input word sequences into a dense vector space. The output of the model is a seqeuence of probability distributions across the given vocabulary.
This library is named after Broca's area, one of the main language processors in the human brain. Broca's area is responsible for language comprehension as well as language production.
- determine sentence probabilities
- generate random sentences (sampling)
- continue incomplete word sequences (sampling)
The LSTM implementation is based on the Theano library. Theano and Gensim both support fast calculations with native CPU or GPU (Cuda) code. For more information see Cuda, Theano and Gensim documentations.
Tested on Ubuntu 14.04:
sudo pip3 install brocas-lm
from brocas_lm.model import Normalization from brocas_lm.model import NormalizationIter from brocas_lm.model import LanguageModel
A complete documentation will be available soon. Take a look at the examples folder for basic usage information.