mozcpy

Mozc for Python: yet another Kana-Kanji converter


Keywords
Kana-Kanji, converter, japanese-language, kana-kanji-conversion, mecab, natural-language-processing, python3
License
MIT
Install
pip install mozcpy==0.1

Documentation

mozcpy

circleci pyversion latest version license

Mozc for Python: yet another Kana-Kanji converter

INSTALLATION

$ pip install mozcpy

USAGE

import mozcpy

converter = mozcpy.Converter()
converter.convert('γΎγ»γ†γ—γ‚‡γ†γ˜γ‚‡')
# => '魔法少ε₯³'
converter.convert('γΎγ»γ†γ—γ‚‡γ†γ˜γ‚‡', n_best=10)
# => ['魔法少ε₯³', 'ι­”ζ³•ζΆˆι™€', 'ι­”ζ³•ηœζ‰€', '魔法小所', 'ι­”ζ³•ζ˜‡ε™', 'ι­”η ²ε°‘ε₯³', 'γƒžγƒ›γ‚¦ε°‘ε₯³', '魔法証所', 'ι­”ζ³•θ³žζ‰€']

converter.convert_wakati('もうγͺにもこわくγͺい')
# => 'もう 何 γ‚‚ 怖く γͺい'
converter.convert_wakati('もうγͺにもこわくγͺい', n_best=3)
# => ['もう 何 γ‚‚ 怖く γͺい', 'もう 何 γ‚‚ こわく γͺい', 'もう 何 γ‚‚ 恐く γͺい']

converter.wakati("もうγͺにもこわくγͺい")
# => 'もう γͺに γ‚‚ こわく γͺい'
converter.wakati("もうγͺにもこわくγͺい", n_best=10)  # duplicatetions are ignored
# => ['もう γͺに γ‚‚ こわく γͺい']

FOR DEVELOPER

This module uses Git LFS to pull dictionary files.

ACKNOWLEDGEMENT

This module relies on Mozc and MeCab.

  • . T. Kudo, T. Hanaoka, J. Mukai, Y. Tabata, H. Komatsu. 2011. Efficient dictionary and language model compression for input method editors. In Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011), pp 19-25.
  • . T. Kudo, H. Komatsu, T. Hanaoka, A. Mukai, Y. Tabata, K. Yamamoto, Y. Matsumoto. 2004. Applying Conditional Random Fields to Japanese Morphological Analysis. In Proceedings of the EMNLP 2004, pp 230-237.