fastcountvectorizer

A faster CountVectorizer alternative.


Keywords
sklearn, scikit-learn, nlp, ngrams, natural-language-processing, python
License
MIT
Install
pip install fastcountvectorizer==0.1.0

Documentation

FastCountVectorizer GitHub Workflow Status (branch)

FastCountVectorizer is a faster alternative to scikit-learn's CountVectorizer.

Installation

TBD

Documentation

TBD

Deviations from scikit-learn implementation

FastCountVectorizer behaves mostly as a subset of CountVectorizer. However, it doesn't do whitespace normalization. This is arguably a better default behavior, but fixing it in scikit-learn would break backwards compatibility.

License

Copyright (c) 2020 Santiago M. Mola

FastCountVectorizer is released under the MIT License.

The following files are included from or derived from third party projects: