CitationParser

The CitationParser (CiPa) is composed by a lexer, a parser and a tree parser written in ANTLR and compiled into Python code.

The idea behind CiPa id pretty simple. Canonical citations are constructed using punctuation symbols in a consistent way, so that we can define a syntax to extract their meaning. Once extracted the meaning is then formalised into JSON as intermendiate representation format.

Given for example the citation """ Hom. Il. 1, 124 - 125 """, to a human reader the following facts are known:

the hyphen is used to specify a range of text passages, from X to Y
the characetr string preceding the numbers contains information about work and author being cited
the semicolon separates a reference from another within the sanme citation (is common to chain together references to mutiple of the same work or of different works)
the comma separates the heirarchical level of the work being cited. In the example above 1,124-5 stands for from Book 1, Line 124 to Book 1, Line 125
when the citation scope is a range, the identical hierarchical level are collapsed: 1.124 - 1.125 can be written as 1.124-125 or 1.124 s. without any loss of information for the human reader

So, given the input: Hom. Il. 1, 124 - 125 the output of the citation parser expressed in JSON is: "[{'work': u'Hom. Il.', 'scp': {'start': ['1', '124'], 'end': ['1', '125']}

Compile the ANTLR grammar files

From the directory ./citation_parser/antlr/, run:

java -cp ../../lib/antlr-3.1.2.jar org.antlr.Tool -o ~/Downloads/ cp_lexer.g cp_parser.g cp_treeparser.g

citation-parser
Release 0.4.1

Release 0.4.1

0.4.1

Documentation

CitationParser

Compile the ANTLR grammar files

Stats

Development practices

Releases

Contributors

citation-parser Release 0.4.1

Release 0.4.1 Toggle Dropdown 0.4.1

Documentation

CitationParser

Compile the ANTLR grammar files

Stats

Development practices

Releases

Contributors

citation-parser
Release 0.4.1

Release 0.4.1

0.4.1