Clean and prepare text for modeling with machine learning.
- lower all
- strip all
- remove numbers
- remove symbols
- remove url
- strip html tags
- remove stopwords by detected language or passed language
- lemming or stemming
Usage
from nlpcleaner import TextCleaner
TextCleaner(txt).clean()
Tests
pipenv install .
python setup.py test
Push on PyPi
python setup.py sdist
pip install twine
twine upload dist/*
TODO
- Add tests to cover different cases and languages;
- check performances