The Levenshtein Python C extension module contains functions for fast computation of
- Levenshtein (edit) distance, and edit operations
- string similarity
- approximate median strings, and generally string averaging
- string sequence and set similarity
It supports both normal and Unicode strings.
Python 2.2 or newer is required; Python 3 is supported.
StringMatcher.py is an example SequenceMatcher-like class built on the top of Levenshtein. It misses some SequenceMatcher's functionality, and has some extra OTOH.
Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension. No separate docs are provided yet, RTFS. But they are not interchangeable:
- C functions exported when compiling with -DNO_PYTHON (see Levenshtein.h) are not exported when compiling as a Python extension (and vice versa)
- Unicode character type used with -DNO_PYTHON is wchar_t, Python extension uses Py_UNICODE, they may be the same but don't count on it
gendoc.sh generates HTML API documentation,
you probably want a selfcontained instead of includable version, so run
./gendoc.sh --selfcontained. It needs Levenshtein already installed
Levenshtein is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
See the file COPYING for the full text of GNU General Public License version 2.
- Maintainer: Antti Haapala <firstname.lastname@example.org>
- Python 3 compatibility: Esa Määttä
- Jonatas CD: Fixed documentation generation
- Previous maintainer: Mikko Ohtamaa
- Original code: David Necas (Yeti) <yeti at physics.muni.cz>