nlp-jp-gears


Keywords
nlp, japanese
License
MIT
Install
pip install nlp-jp-gears==0.1.1

Documentation

NLP JP Gears

PyPI version

Overview

日本語の自然言語処理で頻出の前処理をまとめたものです。pipelineをしいて、複数の処理をまとめることができます。

API

  • pipelineの作成: composer.Composer
  • 全角英数字記号を半角に変換: zenhan.ZenToHanConverter
  • 半角英数字記号を全角に変換: zenhan.HanToZenConverter
  • 括弧とその間のテキストを削除: remover.TextBtwBracketsRemover

Requirements

Python 3.6+

Installation

pip install nlp-jp-gears

Example

from nlp_jp_gears import Composer
from nlp_jp_gears import (
    ZenToHanConverter,
    TextBtwBracketsRemover
)

txt_btw_brackets_remover = TextBtwBracketsRemover()
zenhan_converter = ZenToHanConverter()

composer = Composer(txt_btw_brackets_remover, zenhan_converter)
text = "Python(パイソン)で自然言語処理?"
out = composer(text)
print(out)

Then, input text is preprocessed.

Pythonで自然言語処理?

And you can check what is removed and converted, as follows,

print(txt_btw_brackets_remover.removes)
print(zenhan_converter.converts)
<{[(「『([〈《〔{«‹

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz