w2vembeddings

This package is main for translate word into vector for nlp embedding. maybe chinese maybe english, any language you want to use is ok.

This is main for those who will use word2vec in local from txt files,will help them. In particular, this implementation primarily helps to construct a separate word vector matrix for embedding in local NLP tasks.

It's a efficient way for researcher to reuse in their scenes.

this project reference embeddings

insatll

github

pip (recommend)

pip install w2vembeddings

git clone https://github.com/LG-1/w2vembeddings.git
cd w2vembeddings
python setup.py install

init and manage emb db

from w2vembeddings.managedb import ManageDB
md = ManageDB()
md.list_db()  # 查看有哪些db
md.add_file2db('test', '../data/test_corpos.txt', 10, 8)  # 导入db
md.delete_db('test', 10)  # 删除db

Get word vector

import numpy as np
from w2vembeddings.w2vemb import EMB
emb = EMB(name='tencent', dimensions=200)
np.array(emb.get_vector('三生有幸'))

This speed indicates that if you are in a task, need to build a vector matrix of tens of thousands of words, may be second-order.

Advantages and disadvantages

Advantages

1- Do not need import txt file each time anymore.
2- Does not take up your running memory(RAM, only need disk sapce, base on sqlite3).
3- fast

Disadvantages

1- Unable to use global information.

If you have global information needs, maybe you should reference gensim or else package which load word2vec into RAM once.

reference

tencent word2vec

Tencent Chinese Word2vec you can download from here. More info you can reference.

md.add_file2db('tencent', 'Tencent_AILab_ChineseEmbedding.txt', 200, 8824300)  # that's what i used in previous code for Chinese.

othser embeddings

Glove

embeddings

w2vembeddings
Release 0.1.2

Release 0.1.2

0.1.2

0.1.0

Documentation

w2vembeddings

insatll

github

pip (recommend)

init and manage emb db

Get word vector

Advantages and disadvantages

Advantages

Disadvantages

reference

tencent word2vec

othser embeddings

Stats

Development practices

Releases

Contributors

w2vembeddings Release 0.1.2

Release 0.1.2 Toggle Dropdown 0.1.2 0.1.0

Documentation

w2vembeddings

insatll

github

pip (recommend)

init and manage emb db

Get word vector

Advantages and disadvantages

Advantages

Disadvantages

reference

tencent word2vec

othser embeddings

Stats

Development practices

Releases

Contributors

w2vembeddings
Release 0.1.2

Release 0.1.2

0.1.2

0.1.0