lxml-html-clean

HTML cleaner from lxml project


License
BSD-3-Clause
Install
pip install lxml-html-clean==0.1.1

Documentation

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Installation

You can install this project directly via pip install lxml_html_clean or soon as an extra of lxml via pip install lxml[html_clean]. Both ways installs this project together with lxml itself.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause