LEKCut
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
Install
pip install lekcut
How to use
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']
API
word_tokenize(text: str, model: str="deepcut", path: str="default") -> List[str]
Model
-
deepcut
- We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from Deepcut's Github. The model is here.
Load custom model
If you has trained custom your model from deepcut or other that LEKCut support, You can load the custom model by path
in word_tokenize
after porting your model.
- How to train custom model ith your dataset by deepcut - Notebook (Needs to update
deepcut/train.py
before train model)
How to porting model?
See notebooks/