@wikipathways/cxml

Advanced schema-aware streaming XML parser


Keywords
xml, streaming, schema, parser, xsd, dts, typescript
License
MIT
Install
npm install @wikipathways/cxml@0.2.14

Documentation

cxml

NOTE: the master branch of the source repo for this project did not compile. It also did not support xpath queries. This fork updates the code so that master compiles, augments the testing and adds xpath support.

build status dependency status npm version

Atom screenshot

cxml aims to be the most advanced schema-aware streaming XML parser for JavaScript and TypeScript. It fully supports namespaces, derived types and substitution groups. It can handle pretty hairy schema such as GML, WFS and extensions to them defined by INSPIRE. Output is fully typed and structured according to the actual meaning of input data, as defined in the schema.

Introduction

For example this XML:

<dir name="123">
	<owner>me</owner>
	<file name="test" size="123">
		data
	</file>
</dir>

can become this JSON (run npm test to see it happen):

{
  "dir": {
    "name": "123",
    "owner": "me",
    "file": [
      {
        "name": "test",
        "size": 123,
        "content": "data"
      }
    ]
  }
}

Note the following:

  • "123" can be a string or a number depending on the context.
  • The name attribute and owner child element are represented in the same way.
  • A dir has a single owner but can contain many files, so file is an array but owner is not.
  • Output data types are as simple as possible while correctly representing the input.

See the example schema that makes it happen. Schemas for formats like GML and SVG are nastier, but you don't have to look at them to use them through cxml.

Relevant schema files should be downloaded and compiled using cxsd before using them to parse documents. Check out the example schema converted to TypeScript.

There's much more. What if we parse an empty dir:

import * as cxml from 'cxml';
import * as example from 'cxml/test/xmlns/dir-example';

var parser = new cxml.Parser();

var result = parser.parse('<dir name="empty"></dir>', example.document);

Now we can print the result and try some magical features:

result.then((doc: example.document) => {

    console.log( JSON.stringify(doc) );  // {"dir":{"name":"empty"}}
    var dir = doc.dir;

    console.log( dir instanceof example.document.dir.constructor );   // true
    console.log( dir instanceof example.document.file.constructor );  // false

    console.log( dir instanceof example.DirType );   // true
    console.log( dir instanceof example.FileType );  // false

    console.log( dir._exists );          // true
    console.log( dir.file[0]._exists );  // false (not an error!)

});

Unseen in the JSON output, every object is an instance of a constructor for the appropriate XSD schema type. Its prototype also contains placeholders for valid children, which means you can refer to a.b.c.d._exists even if a.b doesn't exist. This saves irrelevant checks when only the existence of a deeply nested item is interesting. The magical _exists flag is true in the prototypes and false in the placeholder instances, so it consumes no memory per object.

We can also process data as soon as the parser sees it in the incoming stream:

parser.attach(class DirHandler extends (example.document.dir.constructor) {

    /** Fires when the opening <dir> and attributes have been parsed. */

    _before() {
        console.log('Before ' + this.name + ': ' + JSON.stringify(this));
    }

    /** Fires when the closing </dir> and children have been parsed. */

    _after() {
        console.log('After  ' + this.name + ': ' + JSON.stringify(this));
    }

});

The best part: your code is fully typed with comments pulled from the schema! See the screenshot at the top.

Related projects

  • node-xml4js uses schema information to read XML into nicely structured objects.

License

The MIT License

Copyright (c) 2016-2017 BusFaster Ltd