mwtextextractor

Extracts body text from MediaWiki wikitext by stripping off templates, html tags, tables, headers, etc.


Keywords
mediawiki
License
MIT
Install
pip install mwtextextractor==0.1

Documentation

mwtextextractor

https://travis-ci.org/danmichaelo/mwtextextractor.png?branch=master https://coveralls.io/repos/danmichaelo/mwtextextractor/badge.png

mwtextextractor extracts simple body text from MediaWiki wikitext by stripping off templates, html tags, tables, headers, etc. The extracted text can be used for word counting.

Example:

from mwtextextractor import get_body_text
print get_body_text('Lorem {{ipsum}} dolor')