KEM

AKA word2vec, but our nlp suite all starts with prefix "k"

so named it as KEM, keyword embedding model.

reference

Install

(Recommended): Use docker-compose to install

Manually Install

If you want to integrate kem into your own django project, use manually install.

pip install kem

Config

Cause this is a django app

so need to finish these django setups.

settings.py：

INSTALLED_APPS = [
    'kem'
     ...
]

urls.py：

import kem.urls
urlpatterns += [
    url(r'^kem/', include(kem.urls))
]

python3 manage.py buildkem --lang <lang, e.g., zh or en or th> --dimension <int: e.g., 400> --cpus <default=6> --ontology <default=False>
- ontology: experimantal feature, see details
fire python manage.py runserver and go 127.0.0.1:8000/ to check whether the config is all ok.

API

get similar word:/kem

keyword
num (default=10)

ontology (default=False)

example：http://udiclab.cs.nchu.edu.tw/kem?keyword=草履蟲&num=100&lang=zh

["原生動物", 0.7895185351371765]
["藍菌", 0.7865398526191711]
["甲藻", 0.7792112827301025]
["藍綠藻", 0.7636655569076538]
["芽孢", 0.7631546258926392]
["兼性", 0.7622398138046265]
["纖毛蟲", 0.7605307102203369]
["專性", 0.7589520215988159]
["莢膜", 0.7575902938842773]
...
etc

example：http://udiclab.cs.nchu.edu.tw/kem?keyword=中華民國法務部部長&num=100&lang=zh&ontology=True

["中華民國總統府國策顧問"],
["中華民國內政部部長"],
["中華民國法官"],
["中華民國檢察官"],
["國立臺灣大學法律學院校友"]
...
etc

get vector：/kem/vector

keyword

example： http://udiclab.cs.nchu.edu.tw/kem/vector?keyword=女生&lang=zh

[1.3885987997055054, 0.5394280552864075, -0.2656879723072052, 0.7741730809211731, 0.591987133026123 ...]

Experimental Feature

This feature is based on kcem

which is a ontology with isA relation

Setting --ontology to True would turn all noun in the training corpus into hypernym

and concatenate this transformed corpus with original one

Finally, train word2vec with this transformed corpus.

It really enhance the original vector space.

result:

>>> model.most_similar('中華民國法務部部長')
[
  [
    "中華民國總統府國策顧問",
    0.7841469645500183
  ],
  [
    "中華民國內政部部長",
    0.7837527990341187
  ],
  [
    "中華民國法官",
    0.7816867828369141
  ],
  [
    "中華民國檢察官",
    0.7780462503433228
  ],
  [
    "國立臺灣大學法律學院校友",
    0.7581177949905396
  ]
]

origin:

>>> model.most_similar('中華民國法務部部長')
[
  [
    "楊芳婉",
    0.8307946920394897
  ],
  [
    "吳朱疆",
    0.830314040184021
  ],
  [
    "郭宗德",
    0.8272522687911987
  ],
  [
    "莊懷義",
    0.8246101140975952
  ],
  [
    "蔡兆陽",
    0.821085512638092
  ]
]

Built With

python3.5

Contributors

張泰瑋 david
游哲軒 Shane Yu

License

This package use GPL3.0 License.

kem
Release 4.5

Release 4.5

4.5

4.4

4.3

4.2

4.1

4.0

3.9

3.8

3.7

3.6

Documentation

KEM

Install

Manually Install

Config

API

Experimental Feature

Built With

Contributors

License

Stats

Development practices

Releases

Contributors

kem Release 4.5

Release 4.5 Toggle Dropdown 4.5 4.4 4.3 4.2 4.1 4.0 3.9 3.8 3.7 3.6

Documentation

KEM

Install

Manually Install

Config

API

Experimental Feature

Built With

Contributors

License

Stats

Development practices

Releases

Contributors

kem
Release 4.5

Release 4.5

4.5

4.4

4.3

4.2

4.1

4.0

3.9

3.8

3.7

3.6