transition-neural-parser

The package for transition based nueral AMR parser


Keywords
abstract-meaning-representation, amr, amr-graphs, amr-parser, amr-parsing, machine-learning, nlp, semantic-parsing
License
Apache-2.0
Install
pip install transition-neural-parser==0.5.12.1005

Documentation

stack-lstm-amr-parser

Pytorch implementation of a transition-based parser for Abstract Meaning Representation (AMR), based on stack-lstms.

Current code implements the original stack-lstm parser for AMR from

  • Ballesteros, Miguel, and Yaser Al-Onaizan. AMR Parsing using Stack-LSTMs. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. (EMNLP 2017)

with some improvements.

Code developed mainly by Miguel Ballesteros and Austin Blodgett while at IBM.

Install

the code has been tested on Python 3.6

cd stack-lstm-amr-parser
pip install -r requirements.txt

you also need to download the spacy English model for lemmatization

python -m spacy download en

and SMATCH tools to compute scores

git clone https://github.com/snowblink14/smatch.git

you can use virtualenv and pyenv or conda to virtualize modules and python versions and work without the need for root

Data

General training AMR data is available from LDC2017T10. You will need to reformat the alignments to match the JAMR styles (see sample data file in data/wiki25.jkln). Files in data/ are provided with CC-SA 4.0 license. We also provide a sample of the corresponding BERT embeddings in data/

Pre-processing Instructions

After downloading the LDC 2017 data, you can preprocess it as follows. The scripts will build JAMR and Kevin alignments using the same tokenization and then merge them together. You must have the following installed: pip, g++, and ICU (http://site.icu-project.org/home).

cd preprocess
bash preprocess.sh path/to/ldc_data
rm train.* dev.* test.*

New files will be placed in the data folder. The process will take ~1 hour to run.

Test Run

this will use the sample data (train is same as dev)

python learn.py -A data/wiki25.jkaln -a data/wiki25.jkaln -B data/wiki25.bert_max_cased.hdf5 -b data/wiki25.bert_max_cased.hdf5 --name my-model

More information

Action set

The transition-based parser operates using 10 actions:

  • SHIFT : move buffer0 to stack0
  • REDUCE : delete token from stack0
  • CONFIRM : assign a node concept
  • SWAP : move stack1 to buffer
  • LA(label) : stack0 parent of stack1
  • RA(label) : stack1 parent of stack0
  • ENTITY(type) : form a named entity
  • MERGE : merge two tokens (for MWEs)
  • DEPENDENT(edge,node) : Add a node which is a dependent of stack0
  • CLOSE : complete AMR, run post-processing

Files

amr.py : contains a basic AMR class and a class JAMR_CorpusReader for reading AMRs from JAMR format.

state_machine.py : Implement AMR state machine with a stack and buffer

data_oracle.py : Implements oracle to assign gold actions.

learn.py : Runs the parser (use learn.py --help for options)

stack_lstm.py : Implements Stack-LSTM.

entity_rules.json : Stores rules applied by the ENTITY action