lexpr

A parser for simple logical expressions of identifiers


Keywords
bioinformatics, sequence, features
License
ISC
Install
pip install lexpr==0.1

Documentation

Lexpr: A simple logical expressions parser

Lexpr is a simple package containing a logical expressions parser developed using a Lark grammar.

The expressions may contain:

  • entity identifiers
  • the binary operators | (or), & (and)
  • the unary operator ! (not).
  • balanced pairs of round parentheses

Installation

The package can be installed using pip install lexpr.

Usage

A parser is created and used for parsing the text, as in the following example:

import lexpr
lp = lexpr.Parser()
lp.parse("(G1 & G2) | !G3")
#
# output:
#
#  Tree(
#    Token('RULE', 'start'),
#    [Tree(Token('RULE', 'entity'),
#      [Tree(Token('RULE', 'or_expr'),
#        [Tree(Token('RULE', 'entity'),
#          [Tree(Token('RULE', 'enclosed_expr'),
#            [Tree(Token('RULE', 'entity'),
#              [Tree(Token('RULE', 'and_expr'),
#                [Tree(Token('RULE', 'entity'), [Token('IDENTIFIER', 'G1')]),
#                 Tree(Token('RULE', 'entity'), [Token('IDENTIFIER', 'G2')])])]
#            )]
#          )]
#        ), Tree(Token('RULE', 'entity'),
#             [Tree(Token('RULE', 'not_expr'),
#               [Tree(Token('RULE', 'entity'), [Token('IDENTIFIER', 'G3')])]
#             )]
#           )]
#      )]
#    )]
#  )

In case of an invalid string is passed to the parser, an exception is raised:

import lexpr
lp = lexpr.Parser()
lp.parse("G1 &")
# raises LexprParserError, unbalanced expression
lp.parse("G1 & G$")
# raises LexprParserError, invalid character in identifier

Implementation

The grammar is contained in the file lexpr/data/lexpr.g. The parser is in the module lexpr/parser.py. Errors raised by the module are defined in lexpr/error.py and are instances of the class LexprError or its subclasses.

History

The package has been developed to support the parsing of the EGC format, for expressing expectations about the contents of prokaryotic genomes. In this format, groups of organisms can be combined using logical expressions of the form parsed by this package. The main implementation of the format is based on TextFormats, which, however does not support non-regular, indefinetly nested expressions, such as the logical expressions parsed here. Thus the parsing of this expressions has been developed separately in this package.

Acknowledgements

This package has been created in context of the DFG project GO 3192/1-1 “Automated characterization of microbial genomes and metagenomes by collection and verification of association rules”. The funders had no role in study design, data collection and analysis.