tf-word2vec

Word2Vec implentation with Tensorflow Estimators and Datasets


Keywords
word2vec, word, embeddings, tensorflow, estimators, datasets, tensorflow-datasets, tensorflow-estimators, tensorflow2
License
MIT
Install
pip install tf-word2vec==1.0.7

Documentation

Word2Vec

GitHub release PyPI release Build MIT License

This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets.

Works with python >= 3.6 and Tensorflow v2.0.

Install

After a git clone:

python3 setup.py install

Get data

You can download a sample of the English Wikipedia here:

wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z

Train Word2Vec

w2v train \
  --data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
  --outputdir /absolute/path/to/word2vec/models \
  --alpha 0.025 \
  --neg 5 \
  --window 2 \
  --epochs 5 \
  --size 300 \
  --min-count 50 \
  --sample 1e-5 \
  --train-mode skipgram \
  --t-num-threads 20 \
  --p-num-threads 25 \
  --keep-checkpoint-max 3 \
  --batch 1 \
  --shuffling-buffer-size 10000 \
  --save-summary-steps 10000 \
  --save-checkpoints-steps 100000 \
  --log-step-count-steps 10000