brocas-lm

Broca's LM is a free python library providing a probabilistic language model based on a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). It utilizes Gensim's Word2Vec implementation to transform input word sequences into a dense vector space. The output of the model is a seqeuence of probability distributions across the given vocabulary.


License
MIT
Install
pip install brocas-lm==1.0

Documentation

Broca's Language Model

Broca's LM is a free python library providing a probabilistic language model based on a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). It utilizes Gensim's Word2Vec implementation to transform input word sequences into a dense vector space. The output of the model is a seqeuence of probability distributions across the given vocabulary.

This library is named after Broca's area, one of the main language processors in the human brain. Broca's area is responsible for language comprehension as well as language production.

Features

  • determine sentence probabilities
  • generate random sentences (sampling)
  • continue incomplete word sequences (sampling)

Dependencies

The LSTM implementation is based on the Theano library. Theano and Gensim both support fast calculations with native CPU or GPU (Cuda) code. For more information see Cuda, Theano and Gensim documentations.

Installation

Tested on Ubuntu 14.04:

sudo pip3 install brocas-lm

Usage

Import packages:

from brocas_lm.model import Normalization
from brocas_lm.model import NormalizationIter
from brocas_lm.model import LanguageModel

Documentation

A complete documentation will be available soon. Take a look at the examples folder for basic usage information.