ckip-client

A Python Client for CKIP Chinese Word Segmentation System


Keywords
chinese, word, segmentation, ckip, client, nlp, linguistics
License
MIT
Install
pip install ckip-client==0.2.0.post1

Documentation

CKIP Client

A Python client for the Chinese Word Segmentation System (see ckipsvr.iis.sinica.edu.tw) provided by Academia Sinica Chinese Knowledge and Information Processing (CKIP) Group.

Installation

Simply run tho following command:

pip install ckip-client

If pip is not available, you can also download it manually from PyPI.

Note: Currently only Python 3+ is supported.

Example

In order to use the Chinese Word Segmentation System, you should register an account on ckipsvr.iis.sinica.edu.tw. You will get the server's IP address and port number when you finish the registration process.

Replace CKIP_* with your real arguments.

from ckipclient import CKIPClient

sample_text = '第二屆內地搖滾音樂祭將在 9 月 24 日登場!今年共有五十五組獨立樂團'\
              '參與演出,包含曾獲得金曲獎的「舒米恩」、「滅火器」等,陣容十分堅強。'

ckip = CKIPClient(CKIP_IP, CKIP_PORT, CKIP_USERNAME, CKIP_PASSWORD)
sample_results = ckip.segment(sample_text)

for sentence in sample_results:
    print(''.join(' '.join(word) for word in sentence))

The code above will output:

第二 DET/屆 M/內地 N/搖滾 N/音樂 N/祭 Vt/將 ADV/在 P/9 DET/月 N/24 DET/日 M/登場 Vi/! EXCLAMATIONCATEGORY
今年 N/共有 Vt/五十五 DET/組 M/獨立 Vi/樂團 N/參與 Vt/演出 N/, COMMACATEGORY
包含 Vt/曾 ADV/獲得 Vt/金曲獎 N/的 T/「 PARENTHESISCATEGORY/舒米恩 N/」 PARENTHESISCATEGORY/、 PAUSECATEGORY/「 PARENTHESISCATEGORY/滅火器 N/」 PARENTHESISCATEGORY/等 POST/, COMMACATEGORY
陣容 N/十分 ADV/堅強 Vi/。 PERIODCATEGORY

For more details, please refer to the documentation.