ordtext

Parser engine for ordered text.


Keywords
text parser parsing
License
Apache-2.0
Install
pip install ordtext==0.1.1

Documentation

OrdText

Parser engine for ordered text.

Overview

Ordered data consists of a sequence of tokens that appear in a predictable order. Unlike fixed-field data, ordered data may contain tokens that are optional or that may repeat a variable number of times. These two properties prevent a parser from relying on a token's positional index. The only guarantees provided by an ordered data grammar are the order that the tokens will appear in, whether or not they are optional, and how many times a specific token may repeat before a different token appears.

An example of ordered text is a U.S. street address:

123 Broadway Ave., Apt. 15, Philadelphia, PA 19101

The elements of the address (street number, street name, city, etc.) appear in a consistent order if the address is to be considered valid. The apartment number may or may not be present in an address, but should appear in a consistent position in the sequence in order to be valid.

A parser's grammar is defined by a sequence of grammar elements. Each of these elements is applied in the order it appears in the sequence. Since the parser itself may also be used to compose a grammar element, hierarchical ordered text can be parsed by using one or more ordered parser instances as elements of another parser's grammar.

Bugs

If you run into bugs, you can file them in the issue tracker.