่ฟไธชๅทฅๅ ท็็ฎ็ๅจไบ่ชๅจๅๅฎ้ช็ปๆ(ๆๆ ๏ผไพๅฆๆญฃ็กฎ็๏ผAUC๏ผF1)็ญ ็่ฎฐๅฝๅ็ป่ฎกใๅฎ้ช็ปๆๅฏไปฅๅฟซ้ๅฏ่งๅ๏ผ็ป่ฎกๆฐๆฎๅฏไปฅ่ฝปๆๅฏผๅบๅฐtxt,xlsx็ญใ
The purpose of this tool is to automate the recording of experimental results (metrics, e.g. correctness, AUC, F1), etc. The results can be recorded and counted. The results can be visualised quickly and the statistics can be easily exported to txt, xlsx, etc. Currently only the following features are supported:
- Box plot
- Trajectory plot
- Scatter plot
- Bar plot
- Violin plot
- Scott-Knott rank test plot
- A12 effect size plot
- Wilconxon Rank test
- On the way
ๅ ทไฝ็จๆณๅไพๅญ่ฏทๅ่example
If you want to make tikz(latex) plots, you need to install texlive (other latex release version are not tested).
pip install metric_visualizer
mvis example.mv
ๅ่ฎพๅญๅจๅค็ปๅฏนๆฏๅฎ้ช(ๆ่ ไธ็ปๅๆฐ่ฎพ็ฝฎ)๏ผๅ็งฐไนไธบtrial๏ผๆฏ็ปๅฎ้ชๅญๅจๅคไธชmetric(ไพๅฆAUC๏ผAccuracy๏ผF1๏ผLoss็ญ)๏ผ ๆฏ็ปๅ็ งๅฎ้ช้ๅคn่ฏ๏ผๅไฝฟ็จไปฅไธๆนๆณ็ๅฌๅฎ้ช็ปๆ(็ๅฌ็ปๆๅๅฏ่ชๅจ็ปๅถๅพๅฝข)๏ผ Assume that there exist multiple sets of comparison experiments (or a set of parameter settings), called trials, with multiple metrics (e.g., AUC, accuracy, F1, loss, etc.) for each set of experiments. Repeat n words for each set of reference experiments, and then listen to the results of the experiments using the following method.
import random
from metric_visualizer import MetricVisualizer
import numpy as np
MV = MetricVisualizer(name='example', trial_tag='Model')
repeat = 100 # number of repeats
metric_num = 3 # number of metrics
# ๅฉ็จmetric_visualizer็ๅฌๅฎ้ชๅงๅนถไฟๅญๅฎ้ช็ปๆ๏ผ้ๆถ้ๆฐ็ปๅถๅพๅ
trial_names = ['LSTM', 'CNN', 'BERT'] # fake trial names
# trial_names = ['NSGA-II', 'NSGA-III', 'MOEA/D'] # fake trial names
# trial_names = ['Hyperparameter Setting 1', 'Hyperparameter Setting 2', 'Hyperparameter Setting 3'] # fake trial names
for n_trial in range(len(trial_names)):
for r in range(repeat): # repeat the experiments to plot violin or box figure
metrics = [(np.random.random() + n + (1 if random.random() > 0.5 else -1)) for n in
range(metric_num)] # n is metric scale factor
for i, m in enumerate(metrics):
# MV.add_metric(metric_name='metric{}'.format(i + 1), value=m) # add metric by custom name and value
MV.log_metric(trial_name=trial_names[n_trial], metric_name='metric{}'.format(i + 1),
value=m) # add metric by custom name and value
# MV.next_trial() # next_trial() should be used with add_metric() to add metrics of different trials
# MV.remove_outliers() # remove outliers
MV.summary(no_print=False)
-------------------- Metric Summary --------------------
โโโโโโโโโโโโคโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Metric โ Trial โ Values โ Summary โ
โโโโโโโโโโโโชโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโก
โ Metric-1 โ trial-0 โ [0.35, 0.65, 0.67, 0.51, 0.04, 0.43, 0.46, 0.58, 0.11, 0.66] โ ['Avg:0.45, Median: 0.48, IQR: 0.22, Max: 0.67, Min: 0.04'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-1 โ trial-1 โ [0.52, 0.1, 0.11, 0.86, 0.49, 0.7, 0.77, 0.96, 0.16, 0.65] โ ['Avg:0.53, Median: 0.58, IQR: 0.41, Max: 0.96, Min: 0.1'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-1 โ trial-2 โ [0.73, 0.99, 0.13, 0.72, 0.63, 0.61, 0.14, 0.85, 0.71, 0.86] โ ['Avg:0.64, Median: 0.72, IQR: 0.17, Max: 0.99, Min: 0.13'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-1 โ trial-3 โ [0.99, 0.69, 0.86, 0.2, 0.4, 0.1, 0.05, 0.07, 0.95, 0.31] โ ['Avg:0.46, Median: 0.36, IQR: 0.62, Max: 0.99, Min: 0.05'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-1 โ trial-4 โ [0.58, 0.95, 0.73, 0.63, 0.04, 0.19, 0.5, 0.9, 0.64, 0.89] โ ['Avg:0.6, Median: 0.64, IQR: 0.27, Max: 0.95, Min: 0.04'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-2 โ trial-0 โ [1.58, 1.32, 1.98, 1.76, 1.31, 1.6, 1.6, 1.22, 1.3, 1.19] โ ['Avg:1.49, Median: 1.45, IQR: 0.29, Max: 1.98, Min: 1.19'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-2 โ trial-1 โ [1.88, 1.67, 1.77, 1.94, 1.01, 1.6, 1.25, 1.63, 1.62, 1.91] โ ['Avg:1.63, Median: 1.65, IQR: 0.21, Max: 1.94, Min: 1.01'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-2 โ trial-2 โ [1.4, 1.94, 1.28, 1.78, 1.01, 1.08, 1.21, 1.82, 1.78, 1.18] โ ['Avg:1.45, Median: 1.34, IQR: 0.59, Max: 1.94, Min: 1.01'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-2 โ trial-3 โ [1.79, 1.35, 1.14, 1.5, 1.73, 1.06, 1.98, 1.75, 1.07, 1.49] โ ['Avg:1.49, Median: 1.5, IQR: 0.49, Max: 1.98, Min: 1.06'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-2 โ trial-4 โ [1.93, 1.81, 1.18, 1.08, 1.57, 1.85, 1.95, 1.94, 1.58, 1.35] โ ['Avg:1.62, Median: 1.7, IQR: 0.43, Max: 1.95, Min: 1.08'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-3 โ trial-0 โ [2.85, 2.87, 2.3, 2.05, 2.86, 2.34, 2.85, 2.3, 2.95, 2.53] โ ['Avg:2.59, Median: 2.69, IQR: 0.54, Max: 2.95, Min: 2.05'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-3 โ trial-1 โ [2.31, 2.41, 2.34, 2.96, 2.48, 2.68, 2.99, 2.94, 2.01, 2.46] โ ['Avg:2.56, Median: 2.47, IQR: 0.44, Max: 2.99, Min: 2.01'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-3 โ trial-2 โ [2.65, 2.5, 2.68, 2.34, 2.32, 2.61, 2.61, 2.88, 2.86, 2.36] โ ['Avg:2.58, Median: 2.61, IQR: 0.24, Max: 2.88, Min: 2.32'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-3 โ trial-3 โ [2.29, 2.12, 2.4, 2.81, 2.5, 2.54, 2.82, 2.61, 2.45, 2.44] โ ['Avg:2.5, Median: 2.48, IQR: 0.16, Max: 2.82, Min: 2.12'] โ
โโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metric-3 โ trial-4 โ [2.41, 2.12, 2.31, 2.29, 2.46, 2.95, 2.74, 2.66, 2.34, 2.65] โ ['Avg:2.49, Median: 2.44, IQR: 0.33, Max: 2.95, Min: 2.12'] โ
โโโโโโโโโโโโงโโโโโโโโโโงโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโงโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-------------------- Metric Summary --------------------