A python library for named entity recognition evaluation


Keywords
named, entity, recognition, nlp, natural, language, processing, named-entity-recognition, python
License
MIT
Install
pip install mi-ner==0.6.1

Documentation

miNER

A python library for NER (Named Entity Recognition) evaluation

We can evaluate the performance of NER by distinguishing between known entities and unknown entities using this library.

Support

  • Tagging Scheme
    • IOB2
    • BIOES
    • BIOUL
  • metrics
    • precision
    • recall
    • f1

Requirements

  • python3
  • cython

Installation

pip install cython  # must execute before `pip install mi-ner`
pip install mi-ner

Usage

Sample

>>> from miner import Miner
>>> answers = [
    'B-PSN O O B-LOC O O O O'.split(' '),
    'B-PSN I-PSN O O B-LOC I-LOC O O O O'.split(' '),
    'S-PSN O O S-PSN O O B-LOC I-LOC E-LOC O O O O'.split(' ')
]
>>> predicts = [
    'B-PSN O O B-LOC O O O O'.split(' '),
    'B-PSN B-PSN O O B-LOC I-LOC O O O O'.split(' '),
    'S-PSN O O O O O B-LOC I-LOC E-LOC O O O O'.split(' ')
]
>>> sentences = [
    '花子 さん は 東京 に 行き まし た'.split(' '),
    '山田 太郎 君 は 東京 駅 に 向かい まし た'.split(' '),
    '花子 さん と ボブ くん は 東京 スカイ ツリー に 行き まし た'.split(' '),
]
>>> knowns = {'PSN': ['花子'], 'LOC': ['東京']}  # known words (words included in training data)
>>> m = Miner(answers, predicts, sentences, knowns)
>>> m.default_report(True)

	precision    recall    f1_score   num
LOC	 1.000        1.000     1.000      3
PSN	 0.500        0.500     0.500      4
overall	 0.714        0.714     0.714      7
{'LOC': {'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0, 'num': 3},
'PSN': {'precision': 0.5, 'recall': 0.5, 'f1_score': 0.5, 'num': 4},
'overall': {'precision': 0.7142857142857143, 'recall': 0.7142857142857143, 'f1_score': 0.7142857142857143, 'num': 7}}
>>> m.unknown_only_report(True)

	precision    recall    f1_score   num
LOC	 1.000        1.000     1.000      2
PSN	 0.000        0.000     0.000      2
overall	 0.500        0.500     0.500      4
{'LOC': {'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0, 'num': 2},
'PSN': {'precision': 0.0, 'recall': 0.0, 'f1_score': 0, 'num': 2},
'overall': {'precision': 0.5, 'recall': 0.5, 'f1_score': 0.5, 'num': 4}}
>>> m.return_predict_named_entities()
{'known': {'LOC': ['東京'], 'PSN': ['花子'], 'overall': []},
'unknown': {'LOC': ['東京スカイツリー', '東京駅'], 'PSN': ['山田', '太郎'], 'overall': []}}

Methods

method description
default_report(print_) return result of named entity recognition. if print_=True, showing result
known_only_report(print_) return result of known named entity recognition.
unknown_only_report(print_) return result of unknown named entity recognition.
return_predict_named_entities() return named entities along predicted label(predicts).
return_answer_named_entities() return named entities along answer label(answer).
return_miss_labelings() return miss labeling sentences.
segmentation_score(mode) show parcentages of matching answer and predict labels. if known or unknown for mode, return labeling accuracy for known or unknown NE.

License

MIT