Broca's Language Model

Broca's LM is a free python library providing a probabilistic language model based on a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). It utilizes Gensim's Word2Vec implementation to transform input word sequences into a dense vector space. The output of the model is a seqeuence of probability distributions across the given vocabulary.

This library is named after Broca's area, one of the main language processors in the human brain. Broca's area is responsible for language comprehension as well as language production.

Features

determine sentence probabilities
generate random sentences (sampling)
continue incomplete word sequences (sampling)

Dependencies

Python >= 3.4
Theano
Gensim
Numpy
NLTK (for examples only)

The LSTM implementation is based on the Theano library. Theano and Gensim both support fast calculations with native CPU or GPU (Cuda) code. For more information see Cuda, Theano and Gensim documentations.

Installation

Tested on Ubuntu 14.04:

sudo pip3 install brocas-lm

Usage

Import packages:

from brocas_lm.model import Normalization
from brocas_lm.model import NormalizationIter
from brocas_lm.model import LanguageModel

Documentation

A complete documentation will be available soon. Take a look at the examples folder for basic usage information.

brocas-lm
Release 1.0

Release 1.0

1.0

0.1

Documentation

Broca's Language Model

Features

Dependencies

Installation

Usage

Documentation

Stats

Development practices

Releases

Contributors

brocas-lm Release 1.0

Release 1.0 Toggle Dropdown 1.0 0.1

Documentation

Broca's Language Model

Features

Dependencies

Installation

Usage

Documentation

Stats

Development practices

Releases

Contributors

brocas-lm
Release 1.0

Release 1.0

1.0

0.1