Implementations of Norvig's spelling corrector in various languages.
- Python (minor modification of Norvig's original program)
- C++14 (common header)
- Java 8
hat-trie is the fast trie implementation by dcjones.
hat-trie library is included as a submodule. Doing
git submodule init git submodule update bash -c 'cd hat-trie && autoreconf -i && ./configure && make'
will build it.
The Haskell programs depend on the
The Java program requires Java 8 and Maven.
Once you have all the dependencies, running
make at the top level will build all the programs and place them in
Each program can be run as
norvig_xx [training data set]
[training data set] is a plain text file from which the program will learn word frequencies. The programs expect to be given one word per line on standard input and print
pairs on standard output.
data/ has a training file
train.txt and a test file
test.txt. The first is exactly Norvig's
big.txt. The second is Norvig's test set but with multiword strings removed.
make benchmark creates a file
benchmarks/all.md containing performance results.
The different implementations use different algorithms and code organization, so the benchmarks are not in any sense a comparison of the different languages.
This is what I get on my setup.
|Version||Time (s)||Memory use
|Lines of code|
|C (dynamic programming)||0.3||162.7||225|