NER_eval
A simple implementation of strict/lenient matching to evaluate NER performance (precision, recall, F1-score) in 60 lines!
This script currently only supports the IOB2 format with both strict and lenient modes.
Installation
pip install ner_metrics
Usage
from ner_metrics import classication_report
y_true = ['B-PER', 'I-PER', 'O', 'B-ORG', 'B-ORG', 'O', 'O', 'B-PER', 'I-PER', 'O']
y_pred = ['O', 'B-PER', 'O', 'B-ORG', 'B-ORG', 'I-ORG', 'O', 'B-PER', 'I-PER', 'O']
classifcation_report(tags_true=y_true, tags_pred=y_pred, mode="lenient") # for lenient match
classifcation_report(tags_true=y_true, tags_pred=y_pred, mode="strict") # for strict match
Expected output
tag(lenient): PER precision:1.0 recall:1.0 f1-score:1.0
tag(strict): PER precision:0.5 recall:0.5 f1-score:0.5
tag(lenient): ORG precision:1.0 recall:1.0 f1-score:1.0
tag(strict): ORG precision:0.5 recall:0.5 f1-score:0.5
The results are also saved to evaluation.json
How to cite this work
If you find this git repo useful, please consider citing it using the snippet below:
@misc{ner_eval,
author={Le Peng},
title={ner_metrics: A Simple Python Snippets for NER Evaluation},
howpublished={\url{https://github.com/PL97/NER_eval}},
year={2022}
}