zanpakuto

A small utility to simplify HTML excerpts


License
Other
Install
pip install zanpakuto==0.2

Documentation

zanpakuto

Build Status

zanpakuto is a small Python library that aims to simplify HTML excerpts. Its main objective is to normalize weird, deep-nested, non-semantic, stylish portions of HTML containing text into a cleaner HTML.

This project is makes use of the awesome lxml and it is inspired by htmllaundry.

Usage examples

strip_html:

from zanpakuto import strip_html
html = "<h1>Some html with <i>italics</i> and <b>bold</b> tags."
print strip_html(html)

It will output:

Some html with italics and bold tags.

simplify_html

from zanpakuto import simplify_html
html = u"A<div><div>B</div><div><div>C</div></div></div><div>D</div>"
print simplify_html(html)

It will output:

<p>A</p><p>B</p><p>C</p><p>D</p>

Take a look on tests package to more examples.