net.sourceforge.htmlcleaner:htmlcleaner

HtmlCleaner is an HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that most web-browsers use.


License
BSD-3-Clause