xrenner

eXternally configurable REference and Non Named Entity Recognizer

https://corpling.uis.georgetown.edu/xrenner/

Usage:

xrenner.py [options] INFILE (> OUTFILE)
xrenner.py [options] *.conllu

Options:

-m, --model input model name in models/, default 'eng' -o, --output output format, default: sgml; alternatives: html, paula, webanno, webannotsv, conll, onto, unittest -x, --override specify a section model's override.ini file with alternative settings; e.g. OntoNotes or GUM for English -v, --verbose output run time and summary -r, --rulebased rule based operation, disable stochastic classifiers in selected model -d, --dump <FILE> dump all anaphor-antecedent candidate pairs to <FILE> to train classifiers -p, --procs NUM number of processes to run in parallel (only useful if running on multiple documents) -t, --test run unit tests and quit

--version print xrenner version and quit

More exotic options:

--oracle use external file with entity type predictions per token span (for integrating separate NER) --noseq do not use machine learning sequence tagger even when available

Input format:

1   Wikinews    _   PROPN   NNP _   2   nsubj   _   _
2   interviews  _   VERB    VBZ _   0   root    _   _
3   President   _   NOUN    NN  _   2   obj _   _
4   of  _   ADP IN  _   7   case    _   _
5   the _   DET DT  _   7   det _   _
6   International   _   PROPN   NNP _   7   amod    _   _
7   Brotherhood _   PROPN   NNP _   3   nmod    _   _
8   of  _   ADP IN  _   9   case    _   _
9   Magicians   _   PROPN   NNPS    _   7   nmod    _   _

1   Wednesday   _   PROPN   NNP _   0   root    _   _
2   ,   _   PUNCT   ,   _   4   punct   _   _
3   October _   PROPN   NNP _   4   compound    _   _
4   9   _   NUM CD  _   1   appos   _   _
5   ,   _   PUNCT   ,   _   6   punct   _   _
6   2013    _   NUM CD  _   4   nmod:tmod   _   _

Format for external NER predictions when using --oracle option:

Soaking the Bowl in Boiling Water
2,4 object|5,7 substance
2,4 object|5,7 substance

Choose artwork with cool colors .
2,6 object|4,6 abstract
2,3 object|4,6 abstract

Installation:

Download the repo and use the main xrenner.py script on an input file, or install from PyPI and import as a module:

> pip install xrenner

Examples:

python xrenner.py example_in.conll10 > example_out.sgml
python xrenner.py -x GUM example_in.conll10 > example_out.sgml
python xrenner.py -o conll example_in.conll10 > example_out.conll
python xrenner.py -m eng -o conll *.conll10 (automatically names output files based on input files)

Note that by default, the English model is invoked (-m eng), and this model expects input in Universal Dependencies.

To use neural entity classification and machine learning coreference prediction with the English model, flair and xgboost must be installed (see requirements.txt)

Module usage:

from xrenner import Xrenner

xrenner = Xrenner()
# Get a parse in Universal Dependencies
my_conllx_result = some_parser.parse("John visited Spain. His visit went well.")

sgml_result = xrenner.analyze(my_conllx_result,"sgml")
print(sgml_result)