AugLy-jp
Data Augmentation for Japanese Text on AugLy
Augmenter
base_text = "ใใใใ็พๅฎใใในใฆ่ชๅใฎใปใใธใญใๆฒใใใฎใ "
Augmenter | Augmented | Description |
---|---|---|
SynonymAugmenter | ใใใใ็พๅฎใใในใฆ่ช่บซใฎใปใใธใญใๆฒใใใฎใ | Substitute similar word according to Sudachi synonym |
WordEmbsAugmenter | ใใใใ็พๅฎใใในใฆ้ขๅฟใฎใปใใธใญใๆฒใใใฎใ | Leverage word2vec, GloVe or fasttext embeddings to apply augmentation |
FillMaskAugmenter | ใคใพใ็พๅฎใใๆชๆฅใชๆชๆฅใพใงๅคใใใใใ | Using masked language model to generate text |
BackTranslationAugmenter | ใใใฆใใปใใฎไบบใใกใใใใใใฎ้ใซๅฎ็ฝฎใใฆใใใใ | Leverage two translation models for augmentation |
Prerequisites
Software | Install Command |
---|---|
Python 3.8.11 | pyenv install 3.8.11 |
Poetry 1.1.* | curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python |
Get Started
Installation
pip install augly-jp
Or clone this repository:
git clone https://github.com/chck/AugLy-jp.git
poetry install
Test with reformat
poetry run task test
Reformat
poetry run task fmt
Lint
poetry run task lint
Inspired
- https://github.com/facebookresearch/AugLy
- https://github.com/makcedward/nlpaug
- https://github.com/QData/TextAttack
License
This software includes the work that is distributed in the Apache License 2.0 [1].