html5/htmlreader

Html5 stream tokenizer/reader (not using libxml)


Keywords
HTML5, html, tokenizer, parsing
License
MIT

Documentation

Downloads this Month Latest Stable Version

HTMLReader

HtmlReader is a very simple Html Parser NOT build on libxml. It is thought as replacement for XMLReader which won't parse html5 input data properly. It is faster than DOM and won't change a single whitespace.

It won't care about properly closed Elements etc. so you can / have to do it your own.

Installation

Use Composer to install the Package from Packagist.com:

composer require html5/htmlreader

Usage

$reader = new HtmlReader();
$reader->loadHtml("input.html")
// $reader->loadHtmlString("<html></html>");

$reader->setHandler(new HtmlCallback()); // <-- Write your own HtmlCallback
$reader->parse();

Debugging

We have packed a DebugHtmlCallback Handler.

New in Version 1.1.0

  • Added Support for Namespaces

Credits

Written by Matthias Leuffen http://leuffen.de