textmentations

A Python library for augmenting Korean text.


License
MIT
Install
pip install textmentations==1.1.0

Documentation

Textmentations

Textmentations is a Python library for augmenting Korean text. Inspired by albumentations. Textmentations uses the albumentations as a dependency.

Installation

pip install textmentations

A simple example

Textmentations provides text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.

This allows textmentations to reuse the existing functionalities of albumentations.

import textmentations as T
from albumentations import Compose

text = "์–ด์ œ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ ํ•œ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค."
rd = T.RandomDeletion(deletion_prob=0.3, min_words_each_sentence=1)
ri = T.RandomInsertion(insertion_prob=0.3, n_times=1)
rs = T.RandomSwap(n_times=3)
sr = T.SynonymReplacement(replacement_prob=0.3)
eda = Compose([rd, ri, rs, sr])

print(rd(text=text)["text"])
# ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ.

print(ri(text=text)["text"])
# ์–ด์ œ ์ตœ๊ทผ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ ํ•œ์ž”์„ ๋งˆ์…จ๋‹ค ์Œ๋ฃŒ์ˆ˜. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

print(rs(text=text)["text"])
# ์–ด์ œ ๊ฐ”๋‹ค ์‹๋‹น์—. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋ฌผ ๋จผ์ € ํ•œ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋จน์—ˆ๋‹ค ๋ง›์žˆ๊ฒŒ ํƒ•์ˆ˜์œก์„.

print(sr(text=text)["text"])
# ๊ณผ๊ฑฐ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ์†Œ์ฃผ ํ•œ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

print(eda(text=text)["text"])
# ์‹๋‹น์— ์–ด์ œ ๊ณผ๊ฑฐ. ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ์ƒ์ˆ˜ ํ•œ์ž”์„ ๋งˆ์…จ๋‹ค ๋งน๋ฌผ. ๋จน์—ˆ๋‹ค ๊ทธ๋ฆฌ๊ณ  ๋ง›์žˆ๊ฒŒ.

List of augmentations

References