CI & Test Status | |
Code Quality | |
Dependencies | |
Local Analysis | |
Usage | |
Contribution | |
PyPI | |
conda-forge |
Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:
-
- Phonetic algorithms
-
- Robert C. Russell's Index
- American Soundex
- Refined Soundex
- Daitch-Mokotoff Soundex
- Kölner Phonetik
- NYSIIS
- Match Rating Algorithm
- Metaphone
- Double Metaphone
- Caverphone
- Alpha Search Inquiry System
- Fuzzy Soundex
- Phonex
- Phonem
- Phonix
- SfinxBis
- phonet
- Standardized Phonetic Frequency Code
- Statistics Canada
- Lein
- Roger Root
- Oxford Name Compression Algorithm (ONCA)
- Eudex phonetic hash
- Haase Phonetik
- Reth-Schek Phonetik
- FONEM
- Parmar-Kumbharana
- Davidson's Consonant Code
- SoundD
- PSHP Soundex/Viewex Coding
- an early version of Henry Code
- Norphone
- Dolby Code
- Phonetic Spanish
- Spanish Metaphone
- MetaSoundex
- SoundexBR
- NRL English-to-phoneme
- Beider-Morse Phonetic Matching
-
- String distance metrics
-
- Levenshtein distance
- Optimal String Alignment distance
- Levenshtein-Damerau distance
- Hamming distance
- Tversky index
- Sørensen–Dice coefficient & distance
- Jaccard similarity coefficient & distance
- overlap similarity & distance
- Tanimoto coefficient & distance
- Minkowski distance & similarity
- Manhattan distance & similarity
- Euclidean distance & similarity
- Chebyshev distance
- cosine similarity & distance
- Jaro distance
- Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
- Longest common substring
- Ratcliff-Obershelp similarity & distance
- Match Rating Algorithm similarity
- Normalized Compression Distance (NCD) & similarity
- Monge-Elkan similarity & distance
- Matrix similarity
- Needleman-Wunsch score
- Smith-Waterman score
- Gotoh score
- Length similarity
- Prefix, Suffix, and Identity similarity & distance
- Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
- Bag distance
- Editex distance
- Eudex distances
- Sift4 distance
- Baystat distance & similarity
- Typo distance
- Indel distance
- Synoname
-
- Stemmers
-
- the Lovins stemmer
- the Porter and Porter2 (Snowball English) stemmers
- Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
- CLEF German, German plus, and Swedish stemmers
- Caumann's German stemmer
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
-
- String Fingerprints
-
- string fingerprint
- q-gram fingerprint
- phonetic fingerprint
- Pollock & Zomora's skeleton key
- Pollock & Zomora's omission key
- Cisłak & Grabowski's occurrence fingerprint
- Cisłak & Grabowski's occurrence halved fingerprint
- Cisłak & Grabowski's count fingerprint
- Cisłak & Grabowski's position fingerprint
- Synoname Toolcode
Required libraries:
- NumPy
- deprecation
Optional libraries (all available on PyPI, some available on conda or conda-forge):
To install Abydos (master) from Github source:
git clone https://github.com/chrislit/abydos.git --recursive cd abydos python setup install
If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call:
python3 setup install
To install Abydos (latest release) from PyPI using pip:
pip install abydos
To install from conda-forge:
conda install abydos
It should run on Python 3.5-3.8.
To run the whole test-suite just call tox:
tox
The tox setup has the following environments: black, py37, doctest, regression, fuzz, pylint, pydocstyle, flake8, doc8, docs, sloccount, badges, & build. So if you only want to generate documentation (in HTML, EPUB, & PDF formats), just call:
tox -e docs
In order to only run & generate Flake8 reports, call:
tox -e flake8
Contributions such as bug reports, PRs, suggestions, desired new features, etc. are welcome through Github Issues & Pull requests.