pure-pyawabi
pure-pyawabi
is a pure python implementation of awabi(https://github.com/nakagami/awabi).
If you have Rust development environment, see also https://github.com/nakagami/pyawabi .
Requirements
Python 3.8+
MeCab dictionary
ex) Ubuntu
$ sudo apt install mecab mecab-ipadic-utf8
Install python package
$ pip install pure-pyawabi
How to use
pyawabi command
$ echo 'ăăăăăăăăăăŽăăĄ' | pyawabi
ăăă ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
㎠ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă
ă㥠ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă
EOS
$ echo 'ăăăăăăăăăăŽăăĄ' | pyawabi -N 2
ăăă ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
㎠ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă
ă㥠ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă
EOS
ăăă ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ăă ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘
ă ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘
㎠ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă
ă㥠ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă
EOS
use as package
use function
>>> import pyawabi
>>> import pprint
>>> pp = pprint.PrettyPrinter()
>>> pp.pprint(pyawabi.tokenize("ăăăăăăăăăăŽăăĄ"))
[('ăăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ăŽ', 'ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă'),
('ăăĄ', 'ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă')]
>>> pp.pprint(pyawabi.tokenize_n_best("ăăăăăăăăăăŽăăĄ", 2))
[[('ăăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ăŽ', 'ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă'),
('ăăĄ', 'ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă')],
[('ăăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăŽ', 'ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă'),
('ăăĄ', 'ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă')]]
>>>
use tokenizer object
>>> tok = pyawabi.Tokenizer()
>>> pp.pprint(tok.tokenize("ăăăăăăăăăăŽăăĄ"))
[('ăăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăăă,ăšă˘ă˘,ăšă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ă', 'ĺŠčŠ,äżĺŠčŠ,*,*,*,*,ă,ă˘,ă˘'),
('ăă', 'ĺčŠ,ä¸čŹ,*,*,*,*,ăă,ă˘ă˘,ă˘ă˘'),
('ăŽ', 'ĺŠčŠ,éŁä˝ĺ,*,*,*,*,ăŽ,ă,ă'),
('ăăĄ', 'ĺčŠ,éčŞçŤ,ĺŻčŠĺŻč˝,*,*,*,ăăĄ,ăŚă,ăŚă')]
>>>