sax-super-stream
Transform stream converting XML into object by applying hierarchy of element parsers. It's implemented using sax parser, which allows it to process large XML files in a memory efficient manner. It's very flexible: by configuring element parsers only for those elements, from which you need to extract data, you can avoid creating an intermediary representation of the entire XML structure.
Install
$ npm install --save sax-super-stream
Usage
Example below shows how to print the titles of the articles from RSS feed.
const PARSERS = {
'rss': {
'channel': {
'item': {
$: stream.object,
'title': {
$text(text, o) { o.title = text; }
}
}
}
}
};
const res = await fetch('http://blog.npmjs.org/rss');
const rssStream = res.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(stream(PARSERS));
for await (const item of rssStream) {
console.log('title: %s', item.title);
}
More examples can be found in Furkot GPX and KML importers.
API
stream(parserConfig[, options])
Create transform stream that reads XML and writes objects
-
parserConfig
- contains hierarchical configuration of element parsers, each entry correspondes to the XML element tree, each value describes the action performed when an element is encountered during XML parsing -
options
- optional set of options passed to sax parser - defaults are as follows-
trim
- true -
normalize
- true -
lowercase
- false -
xmlns
- true -
position
- false -
strictEntities
- true -
noscript
- true
-
parserConfig
parserConfig
is a hierarchical object that contains references to either parse functions or other parseConfig
objects
parse function - function(xmlnode, object, context)
-
xmlnode
- sax node with attributes -
object
- contains reference to the currently constructed object if any -
context
- provided to be used by parser functions, it can be used to store intermediatry data
this
is bound to current parsed object stack
parse config reference - object
each propery of the object represents a direct child element of the parsed node in XML hierachy,
special $
is a self reference
'item': parseItemFunction
is the same as:
'item': {
'$': parseItemFunction
}
special values
-
$after
-function(object, context)
- called when element tag is closed, element content is parsed -
$text
-function(text, object, context)
- called when element content is encountered -
$uri
-string
- if specified it should match element namespace, otherwise element will be ignored, if$uri
is not specified namespaces are ignored
predefined parsers
There are several predefined parser functions that can be used in parser config:
-
object(name)
- creates a new object and optionally assigns it to parent'sname
property -
collection(name)
- creates a new Array and optionally assigns it to parent'sname
property -
appendToCollection(name)
- create a new object and append to Array stored in parent'sname
property, create a new Array if it does not exist yet -
assignTo(name)
- assign value to the parent's propertyname
License
MIT © Damian Krzeminski