kr-sentence

Light-weight sentence tokenizer for Korean.


Keywords
Korean, Sentence, Tokenizer
License
Apache-2.0
Install
pip install kr-sentence==0.0.3

Documentation

A light-weight sentence tokenizer for Korean.

Half-width punctuation is generally used in Korean, but this tokenizer also supports full-width punctuation. (For details about full-width punctuation in Korean, please see https://www.w3.org/TR/klreq/).

Installation

pip install kr-sentence

Sample Code:

from kr_sentence.tokenizer import tokenize

paragraph_str = "μ €λŠ” λ―Έκ΅­μΈμ΄μ—μš”. λ§Œλ‚˜μ„œ λ°˜κ°‘μŠ΅λ‹ˆλ‹€."

sentence_list = tokenize(paragraph_str)

for sentence in sentence_list:
	print(sentence)

Other languages

JavaScript -> https://github.com/Rairye/js-sentence-tokenizers