augly-jp

Data Augmentation for Japanese Text


Keywords
augly, data-augmentation, ginza, japanese, natural-language-processing, nlpaug, sudachi
License
MIT
Install
pip install augly-jp==2021.9.30

Documentation

AugLy-jp

Data Augmentation for Japanese Text on AugLy

PyPI Version Python Version Python Test Test Coverage Code Quality Python Style Guide

Augmenter

base_text = "ใ‚ใ‚‰ใ‚†ใ‚‹็พๅฎŸใ‚’ใ™ในใฆ่‡ชๅˆ†ใฎใปใ†ใธใญใ˜ๆ›ฒใ’ใŸใฎใ "

Augmenter Augmented Description
SynonymAugmenter ใ‚ใ‚‰ใ‚†ใ‚‹็พๅฎŸใ‚’ใ™ในใฆ่‡ช่บซใฎใปใ†ใธใญใ˜ๆ›ฒใ’ใŸใฎใ  Substitute similar word according to Sudachi synonym
WordEmbsAugmenter ใ‚ใ‚‰ใ‚†ใ‚‹็พๅฎŸใ‚’ใ™ในใฆ้–ขๅฟƒใฎใปใ†ใธใญใ˜ๆ›ฒใ’ใŸใฎใ  Leverage word2vec, GloVe or fasttext embeddings to apply augmentation
FillMaskAugmenter ใคใพใ‚Š็พๅฎŸใ‚’ใ€ๆœชๆฅใชๆœชๆฅใพใงๅค‰ใˆใŸใ„ใ‚“ใ  Using masked language model to generate text
BackTranslationAugmenter ใใ—ใฆใ€ใปใ‹ใฎไบบใŸใกใ‚’ใใ‚Œใžใ‚Œใฎ้“ใซๅฎ‰็ฝฎใ—ใฆใŠใ‚‰ใ‚ŒใŸ Leverage two translation models for augmentation

Prerequisites

Software Install Command
Python 3.8.11 pyenv install 3.8.11
Poetry 1.1.* curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

Get Started

Installation

pip install augly-jp

Or clone this repository:

git clone https://github.com/chck/AugLy-jp.git
poetry install

Test with reformat

poetry run task test

Reformat

poetry run task fmt

Lint

poetry run task lint

Inspired

License

This software includes the work that is distributed in the Apache License 2.0 [1].