textmentations

A Python library for augmenting Korean text.


Keywords
text, augmentation, classification
License
MIT
Install
pip install textmentations==1.1.1

Documentation

Textmentations

Textmentations is a Python library for augmenting Korean text. Inspired by albumentations. Textmentations uses the albumentations as a dependency.

Installation

pip install textmentations

A simple example

Textmentations provides text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.

This allows textmentations to reuse the existing functionalities of albumentations.

import textmentations as T

text = "์–ด์ œ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ ํ•œ ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค."
rd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)
ri = T.RandomInsertion(insertion_prob=0.2, n_times=1)
rs = T.RandomSwap(alpha=1)
sr = T.SynonymReplacement(replacement_prob=0.2)
eda = T.Compose([rd, ri, rs, sr])

print(rd(text=text)["text"])
# ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

print(ri(text=text)["text"])
# ์–ด์ œ ์ตœ๊ทผ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ๋ฌผ ํ•œ ์ž”์„ ๋งˆ์…จ๋‹ค ์Œ๋ฃŒ์ˆ˜. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

print(rs(text=text)["text"])
# ์–ด์ œ ๊ฐ”๋‹ค ์‹๋‹น์—. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋ฌผ ๋จผ์ € ํ•œ ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค..

print(sr(text=text)["text"])
# ๊ณผ๊ฑฐ ์‹๋‹น์— ๊ฐ”๋‹ค. ๋ชฉ์ด ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ์†Œ์ฃผ ํ•œ ์ž”์„ ๋งˆ์…จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํƒ•์ˆ˜์œก์„ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

print(eda(text=text)["text"])
# ์‹๋‹น์— ์–ด์ œ ๊ณผ๊ฑฐ ๊ฐ”๋‹ค. ๋„ˆ๋ฌด ๋ง๋ž๋‹ค. ๋จผ์ € ์ƒ์ˆ˜ ํ•œ ์ž”์„ ๋งˆ์…จ๋‹ค ๋งน๋ฌผ. ๊ทธ๋ฆฌ๊ณ  ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ๋‹ค.

List of augmentations

References