word2vec - for Nim
Word2vec can be used to turn text into vectors that encode the meanting. You can use these vectors to compare simmilarties between texts.
Exmaple
import word2vec
load(300) # load huge binary file
let
aVec = text2vec("Cat set on a red wall")
bVec = text2vec("Dog set on a red fence")
# how different are they?
echo dist(aVec, bVec)
Getting started
This library uses alreayd created GloVe vectors. There is no need to train your own vectors.
Beforey you start you need to download and convert:
- Download the GloVe vectors: https://nlp.stanford.edu/projects/glove/
- Unzip the
glove.6B.zip
- Run word2vecloader.nim to convert text files into faster to load binary files.
mkdir glovebin
cd glovebin
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip love.6B.zip
cd ..
nim c -r tools/word2vecloader.nim