fb55/htmlparser2


Forgiving html and xml parser

License: MIT

Language: TypeScript

Keywords: dom, html-parser, javascript


htmlparser2

NPM version Downloads Build Status Coverage

A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface.

Installation

npm install htmlparser2

A live demo of htmlparser2 is available here.

Usage

const htmlparser2 = require("htmlparser2");
const parser = new htmlparser2.Parser(
    {
        onopentag(name, attribs) {
            if (name === "script" && attribs.type === "text/javascript") {
                console.log("JS! Hooray!");
            }
        },
        ontext(text) {
            console.log("-->", text);
        },
        onclosetag(tagname) {
            if (tagname === "script") {
                console.log("That's it?!");
            }
        }
    },
    { decodeEntities: true }
);
parser.write(
    "Xyz <script type='text/javascript'>var foo = '<<bar>>';</ script>"
);
parser.end();

Output (simplified):

--> Xyz
JS! Hooray!
--> var foo = '<<bar>>';
That's it?!

Documentation

Read more about the parser and its options in the wiki.

Get a DOM

The DomHandler (known as DefaultHandler in the original htmlparser module) produces a DOM (document object model) that can be manipulated using the DomUtils helper.

The DomHandler, while still bundled with this module, was moved to its own module. Have a look at it for further information.

Parsing RSS/RDF/Atom Feeds

const feed = htmlparser2.parseFeed(content, options);

Note: While the provided feed handler works for most feeds, you might want to use danmactough/node-feedparser, which is much better tested and actively maintained.

Performance

After having some artificial benchmarks for some time, @AndreasMadsen published his htmlparser-benchmark, which benchmarks HTML parses based on real-world websites.

At the time of writing, the latest versions of all supported parsers show the following performance characteristics on Travis CI (please note that Travis doesn't guarantee equal conditions for all tests):

gumbo-parser   : 34.9208 ms/file ± 21.4238
html-parser    : 24.8224 ms/file ± 15.8703
html5          : 419.597 ms/file ± 264.265
htmlparser     : 60.0722 ms/file ± 384.844
htmlparser2-dom: 12.0749 ms/file ± 6.49474
htmlparser2    : 7.49130 ms/file ± 5.74368
hubbub         : 30.4980 ms/file ± 16.4682
libxmljs       : 14.1338 ms/file ± 18.6541
parse5         : 22.0439 ms/file ± 15.3743
sax            : 49.6513 ms/file ± 26.6032

How does this module differ from node-htmlparser?

This module started as a fork of the htmlparser module. The main difference is that htmlparser2 is intended to be used only with node (it runs on other platforms using browserify). htmlparser2 was rewritten multiple times and, while it maintains an API that's compatible with htmlparser in most cases, the projects don't share any code anymore.

The parser now provides a callback interface inspired by sax.js (originally targeted at readabilitySAX). As a result, old handlers won't work anymore.

The DefaultHandler and the RssHandler were renamed to clarify their purpose (to DomHandler and FeedHandler). The old names are still available when requiring htmlparser2, your code should work as expected.

Security contact information

To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

htmlparser2 for enterprise

Available as part of the Tidelift Subscription

The maintainers of htmlparser2 and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. Learn more.

Project Statistics

Sourcerank 19
Repository Size 1.99 MB
Stars 2,408
Forks 281
Watchers 56
Open issues 22
Dependencies 489
Contributors 47
Tags 49
Created
Last updated
Last pushed

Top Contributors See all

Felix Böhm Chris Winberry dependabot-preview[bot] Forbes Lindesay Andreas Madsen Patrick Steele-Idem Felix Gnass jugglinmike Leon Fedotov dependabot[bot] greenkeeper[bot] Christopher Brown Duncan Beevers Chris Rebert Kris Reeves Andreas Lind Siddharth Mahendraker Bogdan Chadkin Hubert Chathi Devon Govett

Packages Referencing this Repo

htmlparser2
Forgiving html and xml parser
Latest release v4.0.0 - Published - 2.41K stars
@thorn0/htmlparser2
Fast & forgiving HTML/XML/RSS parser
Latest release 3.10.2 - Published - 2.41K stars
parser2html
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.2 - Updated - 2.41K stars
htmlparser2
Fast & forgiving HTML/XML/RSS parser
Latest release 4.0.0 - Updated - 2.41K stars
org.webjars.npm:htmlparser2
WebJar for htmlparser2
Latest release 3.10.1 - Updated - 2.41K stars
shaunhtmltest
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.1 - Published - 2.41K stars
htmlparsertest2
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.1 - Published - 2.41K stars
htmlparser2-without-node-native
htmlparser2 build that excludes node native modules so that you can use it in platforms like Reac...
Latest release 3.9.2 - Updated - 2.41K stars
front-htmlparser2
Fast & forgiving HTML/XML/RSS parser
Latest release 3.8.3-1 - Published - 2.41K stars
cleanparser2
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.5 - Updated - 2.41K stars
shaunnpmtest
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.1 - Published - 2.41K stars
shaunparsertest
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.1 - Published - 2.41K stars
@broadly/htmlparser2
Fast & forgiving HTML/XML/RSS parser
Latest release 3.9.0 - Published - 2.41K stars
htmlparser2-papandreou
Fast & forgiving HTML/XML/RSS parser
Latest release 3.9.1-patch1 - Published - 2.41K stars
shaunparsertest2
Fast & forgiving HTML/XML/RSS parser
Latest release 0.0.1 - Published - 2.41K stars

Recent Tags See all

v4.0.0 August 02, 2019
v3.10.1 February 14, 2019
v3.10.0 October 21, 2018
v3.9.2 October 18, 2016
v3.9.1 June 12, 2016
v3.9.0 December 01, 2015
v3.8.3 June 05, 2015
v3.8.2 November 04, 2014
v3.8.1 November 04, 2014
v3.8.0 October 22, 2014
v3.7.3 July 09, 2014
v3.7.2 May 19, 2014
v3.7.1 March 22, 2014
v3.7.0 March 17, 2014
v3.6.0 March 15, 2014

Interesting Forks See all

nihgwu/htmlparser2
forgiving html and xml parser
JavaScript - Last pushed - 5 stars
niilante/htmlparser2
forgiving html and xml parser
JavaScript - Updated - 1 stars
cemoulto/htmlparser2
forgiving html and xml parser
JavaScript - Updated - 1 stars

Something wrong with this page? Make a suggestion

Last synced: 2019-11-20 19:39:22 UTC

Login to resync this repository