th-preprocessor
Simple Thai Preprocess Functions
Objectives
This repository provides simple preprocess techniques for Thai sentences/phrases
Supports
The module supports Python 3.6+
Installation
pip install th-simple-preprocessor
How to Use
from th_preprocessor.preprocess import preprocess
text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)
print(words)
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
Package reference:
th_preprocessor.preprocess.normalize_link
th_preprocessor.preprocess.normalize_at_mention
th_preprocessor.preprocess.normalize_email
th_preprocessor.preprocess.normalize_haha
th_preprocessor.preprocess.normalize_num
th_preprocessor.preprocess.normalize_phone
th_preprocessor.preprocess.normalize_accented_chars
th_preprocessor.preprocess.normalize_special_chars
th_preprocessor.preprocess.remove_hashtags
th_preprocessor.preprocess.remove_tag
th_preprocessor.preprocess.remove_dup_spaces
th_preprocessor.preprocess.remove_emoji
th_preprocessor.preprocess.replace_dup_chars
th_preprocessor.preprocess.replace_dup_emojis
th_preprocessor.preprocess.insert_spaces
th_preprocessor.preprocess.normalize_emoji
th_preprocessor.preprocess.remove_others_char
th_preprocessor.preprocess.remove_stopwords
th_preprocessor.preprocess.preprocess
Copyright
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See LICENSE for details.