eXternally configurable REference and Non Named Entity Recognizer
https://corpling.uis.georgetown.edu/xrenner/
Usage:
xrenner.py [options] INFILE (> OUTFILE)
xrenner.py [options] *.conllu
Options:
-m, --model input model name in models/, default 'eng' -o, --output output format, default: sgml; alternatives: html, paula, webanno, webannotsv, conll, onto, unittest -x, --override specify a section model's override.ini file with alternative settings; e.g. OntoNotes or GUM for English -v, --verbose output run time and summary -r, --rulebased rule based operation, disable stochastic classifiers in selected model -d, --dump <FILE> dump all anaphor-antecedent candidate pairs to <FILE> to train classifiers -p, --procs NUM number of processes to run in parallel (only useful if running on multiple documents) -t, --test run unit tests and quit
--version print xrenner version and quit
More exotic options:
--oracle use external file with entity type predictions per token span (for integrating separate NER) --noseq do not use machine learning sequence tagger even when available
Input format:
1 Wikinews _ PROPN NNP _ 2 nsubj _ _ 2 interviews _ VERB VBZ _ 0 root _ _ 3 President _ NOUN NN _ 2 obj _ _ 4 of _ ADP IN _ 7 case _ _ 5 the _ DET DT _ 7 det _ _ 6 International _ PROPN NNP _ 7 amod _ _ 7 Brotherhood _ PROPN NNP _ 3 nmod _ _ 8 of _ ADP IN _ 9 case _ _ 9 Magicians _ PROPN NNPS _ 7 nmod _ _ 1 Wednesday _ PROPN NNP _ 0 root _ _ 2 , _ PUNCT , _ 4 punct _ _ 3 October _ PROPN NNP _ 4 compound _ _ 4 9 _ NUM CD _ 1 appos _ _ 5 , _ PUNCT , _ 6 punct _ _ 6 2013 _ NUM CD _ 4 nmod:tmod _ _
Format for external NER predictions when using --oracle option:
Download the repo and use the main xrenner.py script on an input file, or install from PyPI and import as a module:
> pip install xrenner
- python xrenner.py example_in.conll10 > example_out.sgml
- python xrenner.py -x GUM example_in.conll10 > example_out.sgml
- python xrenner.py -o conll example_in.conll10 > example_out.conll
- python xrenner.py -m eng -o conll *.conll10 (automatically names output files based on input files)
Note that by default, the English model is invoked (-m eng), and this model expects input in Universal Dependencies.
To use neural entity classification and machine learning coreference prediction with the English model, flair and xgboost must be installed (see requirements.txt)