bioeval

CoNLL-2000 style evaluation of data using BIO and BEISO representation for mutli-token entities (i.e. chunks).

Install

In the root folder execute:

pip install bioeval

Change Log

pypi release and automated CI releases
bioeval now supports pandas DataFame objects through bioeval.evaluate_df.

Usage

The library supports two ways of evaluating span annotation. The first is the native format way while the second uses a pandas DataFrame format.

Native input format

The native input format is a set of tuples, where each tuple signifies the group of tokens in a span. Tokens are also denoted by tuples that are supposed to be unique. The user can achieve that uniqueness through adding a unique identifier to each token as in the example bellow.

from bioeval import evaluate


# gold chunks
chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'B-MV'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP'),
        (7, 'square', 'N', 'I-NP')
    ),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((11, '.', '.', 'O'),)
}

# candidate chunks
guess_chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'I-NP'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP')
    ),
    ((7, 'square', 'N', 'O'),),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((8, '.', '.', 'O'),)
}

# evaluation
f1, pr, re = evaluate(gold_sequence=chunk, guess_sequence=guess_chunk, chunk_col=3)
print(f1)
# 71.43

Dataframe format

The library supports dataframes input through the use of the evaluate_df method, which needs the additional chunkcol and guesscol parameters to specify the gold and candidate spans.

import pandas as pd
from bioeval import evaluate_df

# input data as a JSON parsed to a DataFrame object
df = pd.DataFrame(
    [
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'I-bar','guesstag': 'I-bar'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'B-bar','guesstag': 'I-foo'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'B-foo'}
    ]
)

f1, pr, re = evaluate_df(df=df, chunkcol='chunktag', guesscol='guesstag')

print(f1)
>>> 62.5

bioeval
Release 1.1.14

Release 1.1.14

1.1.14

1.1.13

1.1.3

1.1.2

1.1.2.dev0

1.1.1

1.1

0.1.1

Documentation

bioeval

Install

Change Log

Usage

Native input format

Dataframe format

Stats

Development practices

Releases

Contributors

bioeval Release 1.1.14

Release 1.1.14 Toggle Dropdown 1.1.14 1.1.13 1.1.3 1.1.2 1.1.2.dev0 1.1.1 1.1 0.1.1

Documentation

bioeval

Install

Change Log

Usage

Native input format

Dataframe format

Stats

Development practices

Releases

Contributors

bioeval
Release 1.1.14

Release 1.1.14

1.1.14

1.1.13

1.1.3

1.1.2

1.1.2.dev0

1.1.1

1.1

0.1.1