FastCountVectorizer
FastCountVectorizer is a faster alternative to scikit-learn's CountVectorizer.
Installation
TBD
Documentation
TBD
Deviations from scikit-learn implementation
FastCountVectorizer behaves mostly as a subset of CountVectorizer. However, it doesn't do whitespace normalization. This is arguably a better default behavior, but fixing it in scikit-learn would break backwards compatibility.
License
Copyright (c) 2020 Santiago M. Mola
FastCountVectorizer is released under the MIT License.
The following files are included from or derived from third party projects:
-
fastcountvectorizer.py
is derived from scikit-learn'sscikit-learn/sklearn/feature_extraction/text.py
, licensed under a 3-clause BSD license. The original list of authors and license text can be found in the file header. -
fastcountvectorizer/thirdparty/tsl
includes thetsl::sparse_map
project, released under the MIT License. -
fastcountvectorizer/thirdparty
includes thexxHash
project, released under a BSD-2 Clause license.