html-purifier3

Cuts the tags and attributes from HTML that are not in the whitelist. Their content is left.


License
GPL-3.0
Install
pip install html-purifier3==2.1.0

Documentation

Python HTML purifier

About

Cuts the tags and attributes from HTML that are not in the whitelist. Their content is leaves. Signature of whitelist:

{
    'enabled tag name' : ['list of enabled tag\'s attributes']
}

You can use the symbol * to allow all tags and/or attributes.

Note that the script and style tags are removed with content.

This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.

Part info in my blog

Package on PyPi

Installation

$ pip install html-purifier

Basic Usage

>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
    'div': ['*'], # разрешает все атрибуты у тега div
    'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span
    # все остальные теги удаляются, но их содержимое остается
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>

Django Usage

As usual used in models and forms. Here is purifier.models.PurifyedCharField, purifier.models.PurifyedTextField for Django ORM and purifier.forms.PurifyedCharField for Django forms