PEAMT

This repository contains code to train and run PEAMT, a Perceptual Evaluation metric for Automatic Music Transcription. If you use any of this, please cite:

Adrien Ycart, Lele Liu, Emmanouil Benetos and Marcus Pearce, 2020. "Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription", Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), pp.68–81.

    @article{ycart2019PEAMT,
       Author = {Ycart, Adrien and Liu, Lele and Benetos, Emmanouil and Pearce, Marcus},    
       Booktitle = {Transactions of the International Society for Music Information Retrieval (TISMIR)},    
       Title = {Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription},       
       Year = {2020},
       Volume = {3},
       Issue = {1},
       Pages = {68--81},
       DOI = {http://doi.org/10.5334/tismir.57},
    }

Getting started

This library is compatible with Python 2.7 and 3.6.

To install the dependencies necessary to run the evaluation, run : $ pip install -r requirements.txt

To install the dependencies necessary to train your own metric, run: $ pip install -r requirements_training.txt

Evaluating some outputs

Here is a code sample to evaluate some outputs:

from peamt import PEAMT

output_filename = 'output.mid'
target_filename = 'target.mid'

eval = PEAMT()
value = eval.evaluate_from_midi(target_filename,output_filename)

eval.evaluate_from_midi can take as input a string or a PrettyMIDI object.

Make sure the target filename contains correct velocities and sustain pedal activations (control change #64).

To run directly from lists of notes, use eval.evaluate. Some useful processing functions can be found in features.utils, such as apply_sustain_control_changes and get_notes_intervals.

Training a new metric

To train a new metric, you first need to download the ratings and MIDI files at this address. Then, run $ python export_features.py <MIDI_files_location> <answers_csv_path> <output_folder> to precompute the features and save them, for each pair of (target, output).

Then use the script classifier.py. Create a new @ex.named_config on the same model as export_metric. Edit the entries features_to_use and features_to_remove of the cfg dictionary to specify the features to include or leave out (respectively).

To run the script, run $ python classifier with <your_config_name>.

To use the obtained parameters to evaluate some outputs, use: eval = PEAMT(parameters=<your_parameters_filepath>).

Using individual features

The individual features can also be used:

import pretty_midi as pm

from features.rhythm import rhythm_histogram
import features.utils as utils


output_filename = 'output.mid'
target_filename = 'target.mid'

output_data = pm.PrettyMIDI(output_filename)
target_data = pm.PrettyMIDI(target_filename)

notes_output, intervals_output = utils.get_notes_intervals(output_data)
notes_target, intervals_target = utils.get_notes_intervals(target_data)

rhythm_hist_out, rhythm_hist_diff = rhythm_histogram(intervals_output,intervals_target)

More info

For more info on the way perceptual data was collected and on the way the metric is trained, please refer to the corresponding paper.

For a detailed definition of each of the features, please refer to: Technical report - Musical Features for Automatic Music Transcription Evaluation

Weights of features in the released metric

As an indication, we give here the weights of each feature in the released metric, ranked by magnitude. It appears that the most important features are the notewise onset-only F-measure, followed by rhythm features and other benchmark metrics. Framewise mistakes in the lowest voice also seem to have some importance in the end result.

Rank	Feature ID	Weight
0	notewise_On_F	0.6163901
1	rhythm_disp_drift_mean	0.58185416
2	rhythm_disp_drift_max	-0.51437545
3	framewise_R	0.3728488
4	rhythm_disp_std_mean	-0.36873978
5	rhythm_hist_out	-0.28920627
6	low_f_F	-0.27877852
7	notewise_OnOff_R	-0.25945878
8	framewise_F	0.21865314
9	rhythm_hist_diff	0.21851824
10	low_f_R	0.1989684
11	notewise_On_R	-0.1954612
12	low_f_P	0.19219641
13	poly_diff_mean	0.18691508
14	rhythm_disp_drift_min	-0.18283454
15	high_n_F	0.16963542
16	merge_fp	-0.16701007
17	high_f_F	-0.15762798
18	notewise_OnOff_F	0.1448518
19	poly_diff_std	-0.14053902
20	merge_all	0.109077565
21	loud_ratio_fn	-0.10902009
22	high_f_R	0.10647323
23	low_n_F	-0.10201532
24	low_n_R	0.09830448
25	repeat_fp	-0.097474664
26	high_n_P	0.08767299
27	poly_diff_min	-0.07279301
28	loud_fn	-0.07112731
29	notewise_On_P	-0.060689207
30	notewise_OnOff_P	0.05645829
31	poly_diff_max	0.05339441
32	low_n_P	0.047426596
33	high_f_P	-0.045074914
34	high_n_R	0.041347794
35	rhythm_disp_std_max	0.034391385
36	framewise_P	-0.024282124
37	rhythm_disp_std_min	0.012041504
38	repeat_all	-0.0018782051

peamt
Release 0.2.0

Release 0.2.0

0.2.0

0.1.1

0.1

Documentation

PEAMT

Getting started

Evaluating some outputs

Training a new metric

Using individual features

More info

Weights of features in the released metric

Stats

Development practices

Releases

Contributors

peamt Release 0.2.0

Release 0.2.0 Toggle Dropdown 0.2.0 0.1.1 0.1

Documentation

PEAMT

Getting started

Evaluating some outputs

Training a new metric

Using individual features

More info

Weights of features in the released metric

Stats

Development practices

Releases

Contributors

peamt
Release 0.2.0

Release 0.2.0

0.2.0

0.1.1

0.1