tc_xml_python

Lightweight implementation of the Typecraft XML format in python.


Keywords
tc_xml_python
License
MIT
Install
pip install tc_xml_python==0.3.0

Documentation

Typecraft Python

Documentation Status Updates

This repository contains an IGT model based on the Typecraft IGT format. It also contains a simple CLI for performing various NLP tasks, interfacing with both NLTK and other tools such as the TreeTagger.

Installation

pip install typecraft_python

Features

  • Parsing of the Typecraft XML format.
  • Manipulation of the Typecraft IGT model format.
    • Integrating with NLTK
    • Integrating with TreeTagger
  • Provides a CLI that can be used to load, convert and manipulate raw text and Typecraft XML files.

Usage

Usage: tpy [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  convert
  ntexts   This command lists the number of texts in a...
  raw
  xml

Examples

Load a raw file, tokenize and tag it, and output xml (to stdout):

$ tpy raw your_file.txt

To save to a file

$ tpy raw your_file.txt -o output.xml
# or
$ tpy raw your_file.txt > output.xml

To tag using a specific tagger:

$ tpy raw your_file.txt --tagger=tree  # Tags using the tree tagger

To load a Typecraft xml file and tag it:

$ tpy xml your_file.xml --tag --tagger=nltk -o tagged_output.xml

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.