mt-thresholds

Tool to check how metric deltas for machine translation reflect on system-level human accuracies.


Keywords
machine-translation, evaluation, metrics
Install
pip install mt-thresholds==0.0.4

Documentation

MT Metrics Thresholds

Code for Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies by Tom Kocmi, Vilém Zouhar, Christian Federmann, and Matt Post.

@misc{kocmi2024navigating,
      title={Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies}, 
      author={Tom Kocmi and Vilém Zouhar and Christian Federmann and Matt Post},
      year={2024},
      eprint={2401.06760},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Web frontend

See the MT thresholds tool. image

Local tool

pip3 install mt-thresholds

# accuracy is 63.989%
mt-thresholds bleu 1.00

# ChrF needs 0.710 difference for the same accuracy as BLEU
mt-thresholds chrf 0.63989 --delta

Or use from Python:

import mt_thresholds

mt_thresholds.accuracy(1.0, "bleu") # 0.63989
mt_thresholds.delta(0.63989, "chrf") # 0.665

Experiment code

We plan to release the code for replicating WMT results in upcoming months.