A python module with Program Synthesis techniques for NLP

nlp, transducer, program-synthesis, oracle-learning, ostia, concept-lattice, regex-learning, pac-basis, concept-learning, finite-state-transducers, formal-concept-analysis
pip install psynlp==1.0.4



Program SYnthesis for NLP

PsyNLP is a Python library, that intends to handle morphological inflections for any language in the form of an interpretable program. 🎉

Table of Contents

  1. Installation Guidelines
  2. Usage
  3. Repository structure
  4. Running the tests
  5. Contribution Guidelines
  6. License

Installation Guidelines

(Back to ToC)

  1. Clone the repository
$ git clone
  1. Go to the cloned repository
$ cd PsyNLP
  1. Install the dependencies
$ pip3 install -r requirements.txt

Alternatively, you can also install the module from pip directly using the command:

pip3 install psynlp


(Back to ToC)

With the power of argparse, the acts as the central script to run any of the pipelines, for any language and training data quality.

  • Help menu, for more details:
$ python3 -h
usage: [-h] [-p PIPELINE] [-l LANGUAGE] [-q QUALITY] [-v]

Runs one of the pipeline scripts, for a given language and quality.

optional arguments:
  -h, --help            show this help message and exit
  -p PIPELINE, --pipeline PIPELINE
                        Name of the pipeline file (Default: deterministic)
  -l LANGUAGE, --language LANGUAGE
                        Name of the language (Default: english)
  -q QUALITY, --quality QUALITY
                        Size of the training data (Default: low)
  -v, --verbose         Prints verbose output if specified
  • Running a pipeline (say, ostia) for a language (say, polish) and training data quality (say, high):
$ python3 -p ostia -l polish -q high
  • Get more output debug-like details with verbose flags (max. 3)
# No verbose, just print the exact word-match accuracy
$ python3

# Verbose 1, print the expected and actual words
$ python3 -v

# Verbose 2, print the paths responsible for computing an inflection
$ python3 -vv

# Verbose 3, print debug details for PAC and OSTIA
$ python3 -vvv

Repository structure

(Back to ToC)

  • Base classes:

    The code for base classes can be found in the psynlp/core directory.

    • Contains implementations of PAC and other methods related to Formal Concept Analysis
    • Contains generic Transducer methods, like states and arcs
    • Contains the oracles that're used while computing the PAC basis in
    • Implementation of the well-known OSTIA algorithm, that uses
  • Pipelines:

    The code for the different pipelines can be found in the psynlp/pipelines directory.

    • : Prediction based on Pandas' group_by (deterministic clustering) and OSTIA RegExp matching
    • Prediction based on just the input-output tapes of OSTIA
    • Prediction based on PAC clusters and OSTIA RegExp matching
  • Helpers:

    The code for the different helpers can be found in the psynlp/helpers directory.

    • Monkey-patches some required verbose-related builtin functions
    • Includes functions that imports training and testing data into different structures
    • Miscellaneous functions
    • Text-related functions such as inflecting, prefix, suffix, edit distance, etc.
  • Data:

    The psynlp/data directory contains all the training and testing data. The files are of the form:

    • {language}-train-{quality}
    • {language}-dev

Running the tests

  1. Basic run to check the results:
  1. For debugging:
py.test -s --fulltrace

Contribution Guidelines

(Back to ToC)

Your contributions are always welcome! Please have a look at the contribution guidelines first. 🎉


(Back to ToC)

MIT License 2018 - Gaurav Sahu and Athitya Kumar.