kuzukiri
A simple text segmenter
What's this?
This is a python library for text segmentation of Japanese text.
Features
- Text segmentation by simple rules,
- rule-based, no machine learning,
- so you can assume results.
- comparably fast. It's written in rust-lang.
Install
from PyPI
pip install kuzukiri
from source code
pip install setuptools-rust
python setup.py install
Usage
import kuzukiri
segmenter = kuzukiri.Segmenter()
text = "γγγ―γγΉγγ§γγζεε²γγΎγγ"
sentences = segmenter.split(text)
print(sentences) # => ['γγγ―γγΉγγ§γγ', 'ζεε²γγΎγγ']
For details, see examples
and tests
directories.
License
MIT
Dependencies
- PyO3 : to compile rust code for python.
- unicode_normalization crate : for NFKC normalization