FRAUG
The GitHub repository of the FRAUG (For Realistic AUGmentations)
🚧 WIP
TODO
Methods | Sub-method | Sub-submethod | Interest of the method | Pseudo-code for French | Pseudo-code for multilingual | Rust | Example |
---|---|---|---|---|---|---|---|
Lexical substitution | Thesaurus | Dictionary of synonyms | |||||
WordNet | |||||||
Wonef | |||||||
Word embedding | Gensim (Fauconnier) | ||||||
FastText | |||||||
Masked language model (BERT like) | Random | ||||||
POS | |||||||
TD-IDF | |||||||
Back-translation | Marian (Helsinki-NLP models) | ||||||
M2M100 | |||||||
See if other models have appeared since | |||||||
Transformation of the text surface | Not relevant in French, will have to be done for English | ||||||
Random noise injection | Spelling mistakes injection | ||||||
Typing errors injection | |||||||
Unigram noise injection | |||||||
Noise injection | |||||||
Mixed sentences | |||||||
Random insertion | |||||||
Random swap | |||||||
Random deletion | |||||||
Cross-over augmentation | |||||||
Manipulating the syntax tree | Time manipulation | ||||||
Gender manipulation | |||||||
Number manipulation | |||||||
MixUp | Word Mix Up | ||||||
Sentence Mix Up | |||||||
Generative methods | Generate paraphrases | ||||||
Complexification | |||||||
Text simplification | Text summary | ||||||
Simplification |
If you find the project useful, please consider giving it a star