kuzukiri

Japanese Text Segmenter for Python written in Rust


Keywords
NLP, Natural, Language, Processing, Text, Segmentation, Python, Rust, Japanese, Preprocessing
License
MIT
Install
pip install kuzukiri==0.1.3

Documentation

ζ—₯本θͺž

kuzukiri

A simple text segmenter

What's this?

This is a python library for text segmentation of Japanese text.

Features

  • Text segmentation by simple rules,
    • rule-based, no machine learning,
    • so you can assume results.
  • comparably fast. It's written in rust-lang.

Install

from PyPI

pip install kuzukiri

from source code

pip install setuptools-rust
python setup.py install

Usage

import kuzukiri

segmenter = kuzukiri.Segmenter()
text = "γ“γ‚Œγ―γƒ†γ‚Ήγƒˆγ§γ™γ€‚ζ–‡εˆ†ε‰²γ—γΎγ™γ€‚"
sentences = segmenter.split(text)
print(sentences)  # => ['γ“γ‚Œγ―γƒ†γ‚Ήγƒˆγ§γ™γ€‚', 'ζ–‡εˆ†ε‰²γ—γΎγ™γ€‚']

For details, see examples and tests directories.

License

MIT

Dependencies