jmoy/norvig-spell


Different Implementations of Norvig's spellchecker

Language: C


Implementations of Norvig's spelling corrector in various languages.

Languages

hat-trie is the fast trie implementation by dcjones.

Building

The hat-trie library is included as a submodule. Doing

git submodule init
git submodule update
bash -c 'cd hat-trie && autoreconf -i && ./configure && make'

will build it.

The Haskell programs depend on the bytestring, bytestring-trie and unordered-containers packages.

The Java program requires Java 8 and Maven.

Once you have all the dependencies, running make at the top level will build all the programs and place them in bin/.

Each program can be run as

norvig_xx [training data set]

where [training data set] is a plain text file from which the program will learn word frequencies. The programs expect to be given one word per line on standard input and print

word, correction

pairs on standard output.

The folder data/ has a training file train.txt and a test file test.txt. The first is exactly Norvig's big.txt. The second is Norvig's test set but with multiword strings removed.

Performance

On Linux,running make benchmark creates a file benchmarks/all.md containing performance results.

The different implementations use different algorithms and code organization, so the benchmarks are not in any sense a comparison of the different languages.

This is what I get on my setup.

Version Time (s) Memory use
(Max resident,M)
Lines of code
Python 12.6 75.5 31
C++ (unordered_map) 6.0 5.6 165
C++ (hat-trie) 3.5 3.6 166
Haskell (HashMap) 11.5 81.3 60
Haskell (Trie) 22.3 157.0 61
C (dynamic programming) 0.3 162.7 225
Racket (v.6.1.1) 16.2 277.7 73
(with tests)
Java 10.8 363.2 100

Project Statistics

Sourcerank 5
Repository Size 2.7 MB
Stars 33
Forks 4
Watchers 2
Open issues 0
Dependencies 0
Contributors 4
Tags 0
Created
Last updated
Last pushed

Top Contributors See all

Jyotirmoy Bhattacharya Sam Pullara gus-massa Matthias Felleisen

Something wrong with this page? Make a suggestion

Last synced: 2017-11-24 19:38:47 UTC

Login to resync this repository