Clean and prepare text for modeling with machine learning


License
MIT
Install
pip install nlpcleaner==0.3.1

Documentation

Nlpcleaner Build Status

Clean and prepare text for modeling with machine learning.

  • lower all
  • strip all
  • remove numbers
  • remove symbols
  • remove url
  • strip html tags
  • remove stopwords by detected language or passed language
  • lemming or stemming

Usage

from nlpcleaner import TextCleaner
TextCleaner(txt).clean()

Tests

pipenv install .
python setup.py test

Push on PyPi

python setup.py sdist
pip install twine
twine upload dist/*

TODO

  • Add tests to cover different cases and languages;
  • check performances