docria

Semi-structured Document Model


Keywords
computational-linguistics, document, nlp, semistructured-data, serialization-format, storage
License
Apache-2.0
Install
pip install docria==0.4.0

Documentation

Docria

Documentation Status

Semi-structured document storage model library.

Why?

To provide the ability of sharing, processing and transforming large amounts of natural text from heterogeneous sources.

Most commonly available formats such as TSV and JSON are too flexible and are not self-describing, does not natively define graph concepts such as Nodes and Edges or provide a language independent solution for text spans which are capable of unicode offsetts which are retained over the language boundary (compare Java and Python 3).

Implementations