A tool set of python modules for profile data analysis on Fujitsu's PRIMEHPC FX10 supercomputer

hpc, profiler
pip install pyproffx==0.9.3



A tool set of python modules for the performance profiler on Fujitsu's PRIMEHPC FX10 supercomputer


$ pip install pyproffx

Quick guide

First, I strongly recommend that you use this module with IPython, which has a powerful interactive shell, featured by tab-completion.

When you have installed pyproffx successfuly, you can import the module.

$ ipython
>>> import pyproffx as pfx

If you don't have your profile data, you can find example data sets in the source code.

>>> %ls example/
output_prof_1.csv    output_prof_2.csv    output_prof_3.csv ...

Before loading profiling results, you may need to know the label of the measured region.

>>> fp = '~/path/to/data/'
>>> pfx.program_info(fp)
{'labels': ['__for_accumulate_estimates', '__flip_operator_and_spins'],
 'num_procs': 4,
 'num_threads': 16}

Then identify the label you want and give it to the loader function.

>>> p, d = pfx.load_pa(fp, '__for_accumulate_estimates')

Now you can access the PC counts mesured by the profiler. The example below extracts the counts of L2 demand miss of a thread, which is identified by process ID 1 and thread ID 0.

>>> d.L2_miss_dm('T', 1, 0)

The string 'T' means 'Thread' monitor level, and there are two more monitor levels: process ('P' or 'Process') and application ('A' or 'Application').

Note that the type of return value is always numpy.array.

If you want to know which counters are available, tab-completion in IPython is helpful to show you all.

>>> d.[tab-completion]
d.L1D_miss        d.Reserved31                   d.cse_window_empty_sp_full   d.end1op
d.L1D_thrashing   d.Reserved32                   d.cycle_counts1              d.end2op
d.L1I_miss        d.SIMD_fl_load_instructions    d.cycle_counts2              d.end3op
d.L1I_thrashing   d.SIMD_fl_store_instructions   d.cycle_counts3              d.eu_comp_wait

You can get multi-process/multi-thread results by giving the range of process/thread id.

>>> d.L2_miss_dm('T', (0, 3), (0, 8))
array([193234830, 191787314, 191129047, 192478642, 166060260, 191627643,
       ...                                               ..., 193335331)]

Thre are some performance indices computed by using the combination of the PC counters.

>>> dperf = pfx.Performance(p, d)
>>> dperf.elapsed_time('T', 0, (0, 8))
array([ 62.5973813 ,  62.09905636,  61.46894915,  62.87457782,
        61.68163378,  62.35979694,  60.15607663,  60.92197067])