A toolkit to download, train, use fastText word vectors on text data. Also lets you deduplicate data based on TF IDF representation (see txtMatcher) Developed under MIT license by Openvalue : http://openvalue.co
- For more info on fasttext, see :
- This lib uses gensim's implementation of fastText.
OVNLP runs on Python 3.6 ONLY.
> pip install ovnlp
See demo_notebook.ipynb for usage examples
FT Weights source
Pretrained weights from FB :
- trained on crawl : https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.fr.300.bin.gz
Feel free to change weightsource.json to add data sources if needed.