MARPLE - Multi-tool Analysis of Runtime Performance on Linux Environments
MARPLE is a performance analysis and visualisation tool for Linux. It unifies a wide variety of pre-existing Linux tools such as perf and eBPF into one, simple user interface. MARPLE uses these Linux tools to collect data and write it to disk, and provides a variety of visualisation tools to display the data.
Install Python 3 and Git.
$ sudo apt-get update $ sudo apt-get install python3 git
$ sudo python3 -m pip install marple
sudois not necessary. However, without it
pipwill by default install MARPLE scripts at
~/.local/bin, which is not usually included in the
PATHvariable. Ensure it is before proceeding.
Setup MARPLE - either manually or automatically.
Automatic setup (beta)
This will install the various dependencies required by MARPLE (see below). Note in particular that the FD.io VPP repository will be installed into your home directory.
$ sudo apt-get install perl python3-tkinter libgtk2.0-dev linux-tools-generic linux-tools-common linux-tools-`uname -r`
- Install BCC (see instructions)
- Install G2
# Clone repository git clone https://gerrit.fd.io/r/vpp cd vpp git reset --hard 4146c65f0dd0b5412746064f230b70ec894d2980 # Setup cd src libtoolize aclocal autoconf automake --add-missing autoreconf # Install cd ../build-root make --silent g2-install
MARPLE is now installed, and can be run by invoking the
marplecommand, as described in the usage section.. When you first use MARPLE, it will create a config file at
~/.marpleconfig. If you have installed G2 manually, ensure that the config file has the correct path to your G2 executable.
You can also run MARPLE unit tests:
MARPLE can be separately invoked to either collect or display data.
usage: sudo marple (--collect | --display) [-h] [options]
usage: sudo marple (-c | -d) [-h] [options]
You may wish to set an alias in your
alias marple="sudo `which marple`"
usage: sudo marple --collect [-h] [-o OUTFILE] [-t TIME] subcommand [subcommand ...] Collect performance data. optional arguments: -h, --help show this help message and exit subcommand interfaces to use for data collection. When multiple interfaces are specified, they will all be used to collect data simultaneously. Users can also define their own groups of interfaces using the config file. Built-in interfaces are: cpusched, disklat, mallocstacks, memleak, memtime, callstack, ipc, memevents, diskblockrq, perf_malloc, lib Current user-defined groups are: boot: memleak,cpusched,disklat -o OUTFILE, --outfile OUTFILE specify the data output file. By default this will create a directory named 'marple_out' in the current working directory, and files will be named by date and time. Specifying a file name will write to the 'marple_out' directory - pass in a path to override the save location too. -t TIME, --time TIME specify the duration for data collection (in seconds)
The subommands are listed below. Each one collects either event-based, stack-based, or 2D point-based data.
cpusched: collect CPU scheduling event-based data.
disklat: collect disk latency event-based data.
malloc()call information (including call graph data and size of allocation) as stack-based data.
memleak: collect outstanding memory allocation data (including call grap data) as stack-based data.
memtime: collect data on memory usage over time as point data.
callstack: collect call stack data.
ipc: collect inter-process communication (IPC) data via TCP as event-based data.
memevents: collect memory accesses (including call graph data) as stack-based data.
diskblockrq: collect disk block accesses as event-based data.
malloc()call information (including call graph data) as stack-based data.
lib: Not yet implemented.
MARPLE allows simultaneous collection using multiple subcommands at once - they are simply passed as multiple arguments, or as a custom collection group. Users can define custom collection groups by using the config file. When using many subcommands, all data will be written to a single file.
usage: sudo marple --display [-h] [-fg | -tm] [-g2 | -tcp] [-hm | -sp] [-i INFILE] [-o OUTFILE] Display collected data in required format optional arguments: -h, --help show this help message and exit -fg, --flamegraph display as flamegraph -tm, --treemap display as treemap -g2, --g2 display as g2 image -tcp, --tcpplot display as TCP plot -hm, --heatmap display as heatmap -sp, --stackplot display as stackplot -i INFILE, --infile INFILE Input file where collected data to display is stored -o OUTFILE, --outfile OUTFILE Output file where the graph is stored
Tree maps and flame graphs can be used to display stack-based data. G2 and TCP plots can be used to display event-based data. Heat maps and stack plots can be used to display 2D point-based data.
In general, MARPLE will not require specification of the display mode - it will determine this itself using defaults in the config file. These can be overriden on a case-by-case basis using the command-line arguments. In particular, if displaying a data file with many data sets, overriding stack-based plots to display as flame graphs by using
-fg will result in all stack-based data in that file being displayed as a flame graph for example.
Note that if collecting and displaying data on the same machine, MARPLE remembers the last data file written - in this case, no display options are necessary and simply invoking
marple -d will give the correct display.
Ease-of-use: the user should not have to be concerned with the underlying tools used to analyse system performance. Instead, they simply specify the part of the system they would like to analyse, and MARPLE does the rest.
Separation of data collection and visualisation: the data collection module writes a data file to disk, which is later used by the visualisation module. Importantly, these two phases do not need to be run on the same machine. This allows data to be collected by one user, and sent off elsewhere for analysis by another user.
The workflow for using MARPLE is as follows:
Collect data - specify areas of interest and duration of collection as command line arguments, and MARPLE will collect data and write it to disk.
(optional) Transfer collected data to another machine - the data can be moved to another machine and still viewed.
Display data - specify the input data file, and MARPLE will automatically select the best visualiser to display this data. Alternatively, defaults can be set in the configuration file or pass as command line arguments.
Data collection tools used by MARPLE
There are a few different tools used to collect data, and the interfaces to these are detailed below.
Sometimes referred to as
perf is a Linux performance-analysis tool built into Linux from kernel version 2.6.31. It has many subcommands and can be difficult to use.
Currently, MARPLE can collect data on the following aspects of the system using
- Memory: load and store events (with call graph information)
- Memory: calls to malloc (with call graph information)
- Stack: stack traces
- CPU: scheduling events
- Disk: block requests
For more information on
perf, see Brendan Gregg's blog.
This is another Linux tool, designed for usage on Linux kernel versions 3.2 and later. It is part of Brendan Gregg's perf-tools, which use perf and another Linux tool called
Iosnoop itself uses
ftrace, and traces disk IO. MARPLE therefore uses
iosnoop to trace disk latency.
eBPF and BCC
[Berkeley Packet Filter](https://prototype-kernel.readthedocs.io/en/latest/bpf/ | Extended Berkeley Packet Filter) (eBPF) is derived from the [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt](Berkeley Packet Filter) (BPF, also known as Linux Socket Filtering or LSF). BPF allows a user-space program to filter any socket, blocking and allowing data based on any criteria. BPF programs are executed in-kernel by a sandboxed virtual machine. Linux makes this process relatively straightforward compared to other UNIX flavours such as BSD - the user simply has to write the code for the filter, then send it to the kernel using a specific function call.
eBPF has extended BPF to give a general in-kernel virtual machine, with hooks throughout the kernel allowing it to trace more than just packets, and do more actions than simple filtering. The user writes a program, which is compiled to BPF bytecode and sent to the kernel. The kernel then ensures that an eBPF program cannot harm the system, using static analysis before loading to ensure that it terminates and is safe to execute. It then uses BPF to execute the program, making use of kernel-/user-space probes for dynamic tracing (kprobes and uprobes respectively) and other tracepoints for static tracing and more (see the eBPF workflow diagram). The eBPF programs also use eBPF maps, a generic data structure for storing resulting information. A single eBPF program can be attached to many events, and many programs can use a single map (and vice-versa). Linux kernel versions 3.15 and onwards support eBPF.
Figure: The worflow of using eBPF.
The BPF Compiler Collection (BCC) is a "toolkit for creating efficient kernel tracing and manipulation programs", generally requiring Linux kernel version 4.1 or higher. BCC uses eBPF extensively, and makes it simpler to write eBPF programs by providing a Python interfaces to the various tools. The eBPF programs are written in C, and handled in strings in the BCC code. Many useful example tools are provided too (see the diagram below).
Figure: The BCC tools collection.
Currently, MARPLE uses the following BCC tools:
malloc()function calls for memory allocation, and shows call stacks and the total number of bytes allocated.
memleak- traces the top outstanding memory allocations, allowing for detection of memory leaks.
tcptracer- traces TCP
close()system call. In MARPLE this is used to monitor inter-process communication (IPC) which uses TCP.
For more information on these tools, see Brendan Gregg's blog.
smem memory reporting tool allows the user to track physical memory usage, and takes into account shared memory pages (unlike many other tools). It requires a Linux kernel which provides the "Proportional Set Size" (PSS) metric for reporting a process' proportion of shared memory - this is usually version 2.6.27 or higher. MARPLE uses
smem to regularly sample memory usage.
The tool has a very high latency - it was initially used as it helpfully sorts the processes recorded by memory usage. This is not strictly necessary however, so a command such as
top could be used in its place. Alternatively, the
smemcap is a lightweight version designed to run on resource-constrained systems - see the bottom of the man page.
Data visualisation tools used by MARPLE
Brendan Gregg's flame graphs are a useful visualisation for profiled software, as they show call stack data. An example taken from Brendan Gregg's website is below. Flame graphs are interactive, allowing the user to zoom in to a certain process.
Figure: An example flame graph, showing CPU usage for a MySQL program.
In general any form of stack data is easily viewed as a flame graph.
Tree maps, like flamegraphs, are a useful visualisation tool for call stack data. Unlike flame graphs, which show all levels of the current call stack at any one time, tree maps display only a single level, giving users a more detailed view of that level and the opportunity to drill down into deeper levels. These tree maps are interactive, allowing the user to see useful information on hover, and drill down into each node of the tree. An example tree map
.html file can be found here.
Heat maps allow three-dimensional data to be visualised, using colour as a third dimension. In MARPLE, heat maps are used as histograms - data is grouped into buckets, and colour is used to represent the amount of data falling in each bucket.
Figure: An example heat map, showing disk latency data.
G2 is an event-log viewer. It is very efficient and scalable, and is part of the Vector Packet Processing (VPP) platform. This is a framework providing switch/router functionality, and is the open-source version of Cisco's VPP technology.
See below for an example G2 window.
Figure: An example G2 window, showing scheduling event data.
G2 has many display features, a few of these are listed below.
- Pressing the 'c' key will colour the graph.
- Multiple 'tracks' of events are displayed, and their labels are shown on the left. In the figure above, the tracks are CPU cores.
- The events are ordered chronologically, with a timescale at the bottom of the screen.
- Event types can be hidden/shown using the checkboxes on the right.
- There are navigational and search controls along the bottom of the screen, and in the lower-right conrner.
For more information, consult the G2 wiki.
Stack plots (also known as stacked graphs or stacked charts) show 'parts to the whole'. They are effectively line graphs subdivided to show the various components that make up the overall line value. They can be seen as a form of pie chart, but one that changes with respect to another dimension (usually time). An example stack plot is shown below.
Figure: An example stack plot, showing memory usage by process over time.
- MARPLE logs can be found in