text-metrics

Calculate various string metrics efficiently.


Keywords
bsd3, library, text, Propose Tags , Data.Text.Metrics, Levenshtein distance, Normalized Levenshtein distance, Damerau-Levenshtein distance, Normalized Damerau-Levenshtein distance, Hamming distance, Jaro distance, Jaro-Winkler distance, Overlap coefficient, Jaccard similarity coefficient, edit-distance, this blog post, the GitHub issue tracker for this project, hamming-distance, haskell, jaccard-similarity, jaro-distance, jaro-winkler-distance, levenshtein-distance, string-metrics
License
BSD-3-Clause
Install
cabal install text-metrics-0.3.2

Documentation

Text Metrics

License BSD3 Hackage Stackage Nightly Stackage LTS CI

The library provides efficient implementations of various strings metric algorithms. It works with strict Text values.

The current version of the package implements:

Comparison with the edit-distance package

There is edit-distance package whose scope overlaps with the scope of this package. The differences are:

  • edit-distance allows to specify costs for every operation when calculating Levenshtein distance (insertion, deletion, substitution, and transposition). This is rarely needed though in real-world applications, IMO.

  • edit-distance only provides Levenshtein distance, text-metrics aims to provide implementations of most string metrics algorithms.

  • edit-distance works on Strings, while text-metrics works on strict Text values.

Implementation

Although we originally used C for speed, currently all functions are pure Haskell tuned for performance. See this blog post for more info.

Contribution

Issues, bugs, and questions may be reported in the GitHub issue tracker for this project.

Pull requests are also welcome.

License

Copyright © 2016–present Mark Karpov

Distributed under BSD 3 clause license.