embeddix
A small toolkit for processing word embeddings with numpy. You can use embeddix
to convert .txt embeddings (such as glove files) to numpy and vice-versa.
Install
pip install embeddix
or, after a git clone:
python3 setup.py install
Use
Extract vocabulary from a txt embeddings file
embeddix extract --embeddings /absolute/path/to/embeddings.txt
Convert from txt to numpy
embeddix convert --to numpy --embeddings /absolute/path/to/embeddings.txt
Convert from numpy to txt
embeddix convert --to txt --embeddings /absolute/path/to/embeddings.npy
Evaluate DSM on intrinsic tasks
Evaluate on lexical similarity (men, simlex, simverb) or concept categorization (essli, ap, battig)
embeddix evaluate \
--embeddings /absolute/path/to/embeddings.npy \
--vocab /absolute/path/to/embeddings.vocab \
--dataset instrinsic_task_dataset_name