rougex

A fast python implementation for full ROUGE scores.


Keywords
rouge, summarization, natural, language, processing, computational, linguistics
License
MIT
Install
pip install rougex==1.0.1

Documentation

ROUGE-X

PyPI UnitTest codecov Documentation Status License: MIT

A fast Python implementation for full ROUGE scores, producing the same results as the official ROUGE-1.5.5.pl Perl script. A Python wrapper of the script is also available.

Features

  • Full ROUGE support: Implemented ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S and ROUGE-SU scores, with multiple reference support.
  • High speed: Pure Python implementation without invoking another process.
  • Correctness: Produce exactly the same results as ROUGE-1.5.5.pl does on all ROUGE scores on single document scenario. For multi-document evaluation, the results might be slightly different, because we directly average the scores across documents while the Perl script further adopts bootstrap resampling.
  • Flexible and multi-lingual: We only focus on the language-agnostic tokens, and treat a sentence as a list of tokens. The language-aware pre-processing and tokenization are the freedom of user implementation. You may use different method to tokenize different languages, such as nltk for English and jieba for Chinese.
  • Multi-optional: Besides the Python implementation, we also provide an API to the Perl script to programmatically evaluate the summary results.

Installation

Install a stable version from PyPI.

pip install rougex

Or install the latest version from GitHub.

pip install git+https://github.com/li-plus/rougex.git@master

Configuration

If you want to use the ROUGE-1.5.5.pl script, you may need further configuration. Otherwise, you can skip this section.

Configure on Linux

Everything should work fine on Linux. No further configuration is needed.

Configure on Windows

  • Install Strawberry Perl and add its binary folder to PATH. Type perl --version in your command line, and if you see the version information, the installation is successful.
  • Run pip show rougex to find the location of the installed packages, say d:\softwares\python38\lib\site-packages. Then re-generate the WordNet-2.0.exc.db file in the rougex package by running
cd d:\softwares\python38\lib\site-packages\rougex\RELEASE-1.5.5\data\
del WordNet-2.0.exc.db
perl WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db

Configure on macOS

TODO: I will update it once I get a MacBook.

Quick Start

Evaluate the results using pure Python implementation.

import rougex

# Pre-process and tokenize the summaries as you like
hypotheses = [
    ['how are you'.split(), 'i am fine'.split()],                       # document 1: hypothesis
    ['it is fine today'.split(), 'we won the football game'.split()],   # document 2: hypothesis
]
references = [
    [
        ['how do you do'.split(), 'fine thanks'.split()],   # document 1: reference 1
        ['how old are you'.split(), 'i am three'.split()],  # document 1: reference 2
    ],
    [
        ['it is sunny today'.split(), 'let us go for a walk'.split()],  # document 2: reference 1
        ['it is a terrible day'.split(), 'we lost the game'.split()],   # document 2: reference 2
    ]
]
# Start evaluation
scores = rougex.evaluate(hypotheses, references, rouge_n=(1,2,4), rouge_l=True,
    rouge_w=True, rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)
print(scores)
"""The output will be
{
    'rouge-1': {'f': 0.5362379555927943, 'p': 0.5555555555555556, 'r': 0.5182186234817814},
    'rouge-2': {'f': 0.20347597966879813, 'p': 0.2125, 'r': 0.19518716577540107},
    'rouge-4': {'f': 0.07692307692307691, 'p': 0.08333333333333333, 'r': 0.07142857142857142},
    'rouge-l': {'f': 0.5362379555927943, 'p': 0.5555555555555556, 'r': 0.5182186234817814},
    'rouge-w-1.2': {'f': 0.3931242798550236, 'p': 0.4734837712933738, 'r': 0.33608419409971513},
    'rouge-s4': {'f': 0.27207237393198186, 'p': 0.2916666666666667, 'r': 0.2549450549450549},
    'rouge-su4': {'f': 0.3338954468802698, 'p': 0.35526315789473684, 'r': 0.3149522799575822}
}
"""

Evaluate the results using the ROUGE-1.5.5.pl script wrapper. Note that the script is only for summaries in English. For non-English summaries, use the Python implementation instead, or convert the tokens to integers separated by space before evaluation.

import rougex

rouge = rougex.PerlRouge(rouge_n_max=3, rouge_l=True, rouge_w=True,
    rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)

# Load summary results and evaluate
hypotheses = [
    'how are you\ni am fine',                       # document 1: hypothesis
    'it is fine today\nwe won the football game',   # document 2: hypothesis
]
references = [
    [
        'how do you do\nfine thanks',   # document 1: reference 1
        'how old are you\ni am three',  # document 1: reference 2
    ],
    [
        'it is sunny today\nlet us go for a walk',  # document 2: reference 1
        'it is a terrible day\nwe lost the game',   # document 2: reference 2
    ]
]
scores = rouge.evaluate(hypotheses, references)
print(scores)

# Or evaluate from existing files
hypothesis_dir = 'sample/hypotheses'
reference_dir = 'sample/references'
scores = rouge.evaluate_from_files(hypothesis_dir, reference_dir)
print(scores)
"""The above two outputs are the same, like
{
    'rouge-1': {
        'r': 0.51822, 'r_conf_int': (0.42105, 0.61538),
        'p': 0.55556, 'p_conf_int': (0.44444, 0.66667),
        'f': 0.53622, 'f_conf_int': (0.43243, 0.64)
    },
    'rouge-2': {...},
    'rouge-3': {...},
    'rouge-l': {...},
    'rouge-w-1.2': {...},
    'rouge-s4': {...},
    'rouge-su4': {...}
}
"""

Documentation

Visit rougex.readthedocs.io for API documentation.

License

ROUGE-X is under MIT License.