YARD  Yet Another ROC Drawer
Author:  TamÃ¡s Nepusz 

This is yet another Python package for drawing ROC curves. It also lets you draw precisionrecall, accumulation and concentrated ROC (CROC) curves, sensitivityspecificity plots, Fscore curves and calculate the AUC (area under curve) statistics. The significance of differences between AUC scores can also be tested using paired permutation tests.
yard
Where to get yard
has two homes at the moment:
 The Python package index. This page hosts the most recent stable
version of
yard
. Sinceyard
is under heavy development at the moment, you might not get all the latest and greatest features ofyard
, but you will most likely find a version here that should not collapse even under exceptional circumstances.  A page on GitHub. On this page you can follow the development of
yard
as closely as possible; you can get the most recent development version, file bug reports, or even fork the project to start adding your own features.
Requirements
You will need the following tools to run yard
:
 Python 2.6 or later.
 Matplotlib, which is responsible for plotting the curves. If you don't have Matplotlib, you can export the points of the curves and then use an external plotting tool such as GNUPlot to plot them later.

NumPy is an optional dependency; some functions will be
slightly faster if you have NumPy, but
yard
should work fine without it as well.
Installation
The simplest way to install yard
is by using easy_install
:
$ easy_install yard
This goes to the Python package index page, fetches the most recent
stable version and installs it, creating two scripts in your path:
yardplot
for plotting and yardsignificance
for significance
testing.
If you want the bleeding edge version, you should go to the GitHub page, download a ZIP or .tar.gz file, extract it to some directory and then run the following command:
$ python setup.py install
yard
Running yard
works with simple tabular flat files, and assumes that the first
row in each file is a header. Each row contains data related to a particular
test example. By default, the first column contains the expected outcome
of a binary classifier for a given test example (i.e. whether the example is
positive or negative), while the remaining columns contain the output of
the probabilistic classifiers being tested on the test dataset. The
expected outcome must be positive for positive examples and zero or negative
for negative examples  this means that you may use either zeros and ones
or 1 and 1 for negative and positive test examples, respectively. The
outcomes of the classifiers may be in any range, but they are most frequently
in the interval [0; 1]. The following snippet shows an example input file:
output Method1 Method2 Method3 1 0.2 0.3 0.6 1 0.4 0.15 0.1 +1 0.7 0.2 0.9 +1 0.3 0.85 1.0
Columns must be separated by tabs per default, but this can be overridden
with the f
option on the command line. The actual columns being used
can also be overridden using c
; for instance, if you have the expected
outcome in column 4 and the actual outcomes in columns 13, you may use
c 4,13
to specify that.
Some usage examples are presented here; for more details, type
yardplot help
or yardsignificance help
.
To show a ROC curve for an arbitrary number of classifiers where the expected
and actual outcomes are defined in input_data.txt
:
$ yardplot input_data.txt
If the actual outcomes are in columns 35, the expected outcome is in column 6 and the columns are separated by semicolons:
$ yardplot f ';' c 6,35 input_data.txt
To plot precisionrecall curves instead of ROC curves and also show the AUC statistics:
$ yardplot t pr showauc input_data.txt
Supported curve types are: roc
for ROC curves (default), pr
for
precisionrecall curves, croc
for CROC curves, ac
for accumulation
curves, sespe
for sensitivityspecificity plots, fscore
for
Fscore curves.
To use a logarithmic X axis for the ROC curve and use the standard input instead of a file:
$ yardplot l x
The omission of an input filename instructs yardplot
to use the standard
input. You may have also used 
in place of the filename to specify that.
To save a ROC curve into a PDF file:
$ yardplot o roc_curve.pdf input_data.txt
You may specify other formats as long as they are supported by Matplotlib:
$ yardplot o roc_curve.ps input_data.txt $ yardplot o roc_curve.png input_data.txt
The PDF backend also supports multiple plots in separate pages:
$ yardplot t pr t roc t croc o curves.pdf input_data.txt
The figure size, the DPI ratio and the font size can also be adjusted:
$ yardplot o roc_curve.pdf fontsize 8 s '8cm x 6cm' input_data.txt
To calculate the AUC statistics for multiple curves without plotting them:
$ yardauc t pr t roc input_data.txt
To test whether the ROC curves of multiple classifiers are significantly different:
$ yardsignificance input_data.txt
Questions, comments
If you have a question or comment about yard
or you think you have
found a bug, feel free to contact me.
Acknowledgments and references
The inclusion of CROC curves and the statistical significance testing was inspired by the following publication (which also provides more details on what CROC curves are and why they are more useful than ROC curves in many cases):
A CROC Stronger than ROC: Measuring, Visualizing and Optimizing Early Retrieval. S. Joshua Swamidass, ChloeAgathe Azencott, Kenny Daily and Pierre Baldi. Bioinformatics, April 2010, doi:10.1093/bioinformatics/btq140