A light-weight sentence tokenizer for Korean.
Half-width punctuation is generally used in Korean, but this tokenizer also supports full-width punctuation. (For details about full-width punctuation in Korean, please see https://www.w3.org/TR/klreq/).
Installation
pip install kr-sentence
Sample Code:
from kr_sentence.tokenizer import tokenize
paragraph_str = "μ λ λ―Έκ΅μΈμ΄μμ. λ§λμ λ°κ°μ΅λλ€."
sentence_list = tokenize(paragraph_str)
for sentence in sentence_list:
print(sentence)
Other languages
JavaScript -> https://github.com/Rairye/js-sentence-tokenizers