Simple package for generating ngrams and bag of words representation from text.


Keywords
nlp, text, ngram, ngrams
License
GPL-3.0
Install
pip install text2math==0.0.8.dev1

Documentation

A simple package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python.

More Info:

Practice Dataset

Stack Exchange Data Dump

Text Encoding

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Packages

  • chardet - Universal encoding detector for Python 2 and 3
  • cchardet - Universal encoding detector. This library is faster than chardet
  • ftfy - fixes text for you
  • unidecode - ASCII transliterations of Unicode text

Natural Language Processing

Care and Feeding of Topic Models: Problems, Diagnostics, and Improvementes

Functional Programing in Python

Functional programming in Python Examine the functional aspects of Python: which options work well and which ones you should avoid By David Mertz

Packages

  • toolz - Toolz provides a set of utility functions for iterators, functions, and dictionaries.
  • functools - Higher-order functions and operations on callable objects.
  • itertools - Functions creating iterators for efficient looping.