ROUGE-X

A fast Python implementation for full ROUGE scores, producing the same results as the official ROUGE-1.5.5.pl Perl script. A Python wrapper of the script is also available.

Features

Full ROUGE support: Implemented ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S and ROUGE-SU scores, with multiple reference support.
High speed: Pure Python implementation without invoking another process.
Correctness: Produce exactly the same results as ROUGE-1.5.5.pl does on all ROUGE scores on single document scenario. For multi-document evaluation, the results might be slightly different, because we directly average the scores across documents while the Perl script further adopts bootstrap resampling.
Flexible and multi-lingual: We only focus on the language-agnostic tokens, and treat a sentence as a list of tokens. The language-aware pre-processing and tokenization are the freedom of user implementation. You may use different method to tokenize different languages, such as nltk for English and jieba for Chinese.
Multi-optional: Besides the Python implementation, we also provide an API to the Perl script to programmatically evaluate the summary results.

Installation

Install a stable version from PyPI.

pip install rougex

Or install the latest version from GitHub.

pip install git+https://github.com/li-plus/rougex.git@master

Configuration

If you want to use the ROUGE-1.5.5.pl script, you may need further configuration. Otherwise, you can skip this section.

Configure on Linux

Everything should work fine on Linux. No further configuration is needed.

Configure on Windows

Install Strawberry Perl and add its binary folder to PATH. Type perl --version in your command line, and if you see the version information, the installation is successful.
Run pip show rougex to find the location of the installed packages, say d:\softwares\python38\lib\site-packages. Then re-generate the WordNet-2.0.exc.db file in the rougex package by running

cd d:\softwares\python38\lib\site-packages\rougex\RELEASE-1.5.5\data\
del WordNet-2.0.exc.db
perl WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db

Configure on macOS

TODO: I will update it once I get a MacBook.

Quick Start

Evaluate the results using pure Python implementation.

import rougex

# Pre-process and tokenize the summaries as you like
hypotheses = [
    ['how are you'.split(), 'i am fine'.split()],                       # document 1: hypothesis
    ['it is fine today'.split(), 'we won the football game'.split()],   # document 2: hypothesis
]
references = [
    [
        ['how do you do'.split(), 'fine thanks'.split()],   # document 1: reference 1
        ['how old are you'.split(), 'i am three'.split()],  # document 1: reference 2
    ],
    [
        ['it is sunny today'.split(), 'let us go for a walk'.split()],  # document 2: reference 1
        ['it is a terrible day'.split(), 'we lost the game'.split()],   # document 2: reference 2
    ]
]
# Start evaluation
scores = rougex.evaluate(hypotheses, references, rouge_n=(1,2,4), rouge_l=True,
    rouge_w=True, rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)
print(scores)
"""The output will be
{
    'rouge-1': {'f': 0.5362379555927943, 'p': 0.5555555555555556, 'r': 0.5182186234817814},
    'rouge-2': {'f': 0.20347597966879813, 'p': 0.2125, 'r': 0.19518716577540107},
    'rouge-4': {'f': 0.07692307692307691, 'p': 0.08333333333333333, 'r': 0.07142857142857142},
    'rouge-l': {'f': 0.5362379555927943, 'p': 0.5555555555555556, 'r': 0.5182186234817814},
    'rouge-w-1.2': {'f': 0.3931242798550236, 'p': 0.4734837712933738, 'r': 0.33608419409971513},
    'rouge-s4': {'f': 0.27207237393198186, 'p': 0.2916666666666667, 'r': 0.2549450549450549},
    'rouge-su4': {'f': 0.3338954468802698, 'p': 0.35526315789473684, 'r': 0.3149522799575822}
}
"""

Evaluate the results using the ROUGE-1.5.5.pl script wrapper. Note that the script is only for summaries in English. For non-English summaries, use the Python implementation instead, or convert the tokens to integers separated by space before evaluation.

import rougex

rouge = rougex.PerlRouge(rouge_n_max=3, rouge_l=True, rouge_w=True,
    rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)

# Load summary results and evaluate
hypotheses = [
    'how are you\ni am fine',                       # document 1: hypothesis
    'it is fine today\nwe won the football game',   # document 2: hypothesis
]
references = [
    [
        'how do you do\nfine thanks',   # document 1: reference 1
        'how old are you\ni am three',  # document 1: reference 2
    ],
    [
        'it is sunny today\nlet us go for a walk',  # document 2: reference 1
        'it is a terrible day\nwe lost the game',   # document 2: reference 2
    ]
]
scores = rouge.evaluate(hypotheses, references)
print(scores)

# Or evaluate from existing files
hypothesis_dir = 'sample/hypotheses'
reference_dir = 'sample/references'
scores = rouge.evaluate_from_files(hypothesis_dir, reference_dir)
print(scores)
"""The above two outputs are the same, like
{
    'rouge-1': {
        'r': 0.51822, 'r_conf_int': (0.42105, 0.61538),
        'p': 0.55556, 'p_conf_int': (0.44444, 0.66667),
        'f': 0.53622, 'f_conf_int': (0.43243, 0.64)
    },
    'rouge-2': {...},
    'rouge-3': {...},
    'rouge-l': {...},
    'rouge-w-1.2': {...},
    'rouge-s4': {...},
    'rouge-su4': {...}
}
"""

Documentation

Visit rougex.readthedocs.io for API documentation.

License

ROUGE-X is under MIT License.

rougex
Release 1.0.1

Release 1.0.1

1.0.1

1.0.0

1.0.0rc0

Documentation

ROUGE-X

Features

Installation

Configuration

Quick Start

Documentation

License

Stats

Development practices

Releases

Contributors

rougex Release 1.0.1

Release 1.0.1 Toggle Dropdown 1.0.1 1.0.0 1.0.0rc0

Documentation

ROUGE-X

Features

Installation

Configuration

Quick Start

Documentation

License

Stats

Development practices

Releases

Contributors

rougex
Release 1.0.1

Release 1.0.1

1.0.1

1.0.0

1.0.0rc0