Utility for measuring the fraction of time the GIL is held in a program


License
BSD-3-Clause
Install
pip install gil-load==0.4.2

Documentation

gil_load

gil_load is a utility for measuring the fraction of time the CPython GIL (Global interpreter lock) is held or waited for. It is for linux only, and has been tested on Python 2.7, 3.5, 3.6 and 3.7.

Installation

to install gil_load, run:

$ sudo pip3 install gil_load

or to install from source:

$ sudo python3 setup.py install

gil_load can also be installed with Python 2.

Introduction

A lot of people complain about the Python GIL, saying that it prevents them from utilising all cores on their expensive CPUs. In my experience this claim is more often than not without merit. This module was motivated by the desire to demonstrate that typical parallel code in Python, such as numerical calculations using numpy, does not suffer from high GIL contention and is truly parallel and utilising all cores. However, in other circumstances where the GIL is contested, this module can tell you how contested it is, which threads are hogging the GIL and which are starved.

Usage

In your code, call gil_load.init() before starting any threads. When you wish to begin monitoring, call gil_load.start(). When you want to stop monitoring, call gil_load.stop(). You can thus monitor a small segment of code, which is useful if your program is idle most of the time and you only need to profile when something is actually happening. Multiple calls to gil_load.start() and gil_load.stop() can accumulate statistics over time. See the arguments of gil_load.start() for more details.

You may either pass arguments to gil_load.start() configuring it to output monitoring results periodically to a file (such as sys.stdout), or you may manually collect statistics by calling gil_load.get().

For example, here is some code that runs four threads doing fast Fourier transforms with numpy:

import numpy as np
import threading
import gil_load

N_THREADS = 4
NPTS = 4096

gil_load.init()

def do_some_work():
    for i in range(2):
        x = np.random.randn(NPTS, NPTS)
        x[:] = np.fft.fft2(x).real

gil_load.start()

threads = []
for i in range(N_THREADS):
    thread = threading.Thread(target=do_some_work, daemon=True)
    threads.append(thread)
    thread.start()


for thread in threads:
    thread.join()

gil_load.stop()

stats = gil_load.get()
print(gil_load.format(stats))

To run the script, one must use gil_load to launch the script like so:

python -m gil_load example.py

This runs (on my computer) for about 5 seconds, and prints:

held: 0.004 (0.004, 0.004, 0.004)
wait: 0.0 (0.0, 0.0, 0.0)
  <140125322438464>
    held: 0.0 (0.0, 0.0, 0.0)
    wait: 0.0 (0.0, 0.0, 0.0)
  <140124982937344>
    held: 0.0 (0.0, 0.0, 0.0)
    wait: 0.0 (0.0, 0.0, 0.0)
  <140124974544640>
    held: 0.0 (0.0, 0.0, 0.0)
    wait: 0.0 (0.0, 0.0, 0.0)
  <140124966151936>
    held: 0.001 (0.001, 0.001, 0.001)
    wait: 0.0 (0.0, 0.0, 0.0)
  <140124957759232>
    held: 0.003 (0.003, 0.003, 0.003)
    wait: 0.0 (0.0, 0.0, 0.0)

This output is the total and per-thread averages for the fraction of the time the GIL was held, as well as the 1m, 5m and 15m exponential moving averages thereof. This shows that for this script, the GIL was held 0.4 % of the time, and contested ≈0 % of the time.

How it works

In order to minimise the overhead of profiling, gil_load is a sampling profiler. It waits for random amounts of time and then samples the situation: which thread is holding the GIL, if any, and which threads are waiting for the GIL? This builds up statistics over time, but does mean that answers are only accurate if there have been many samples. The default mean sampling interval is 5ms, and gil_load samples at intervals randomly drawn from an exponential distribution with this mean in order to avoid systematic errors that perfectly regular timing might introduce. Thus, one can only trust profiling results if the duration of profiling is large compared to the mean sample time.

gil_load uses LD_PRELOAD to override some system calls so that it can detect when a thread acquires or releases the GIL, this is why the script must be run with python -m gil_load my_script.py so that gil_load can set LD_PRELOAD before running your script.

Command line and function documentation

To run with monitoring enabled, run your script with:

python -m gil_load [args] my_script.py

Any arguments will be passed to the Python interpreter running your script.

gil_load.init() :

Find the data structure for the GIL in memory so that we can monitor it later to see how often it is held. This function must be called before any other threads are started, and before calling gil_load.start(). Note: this function calls PyEval_InitThreads(), so if your application was single-threaded, it will take a slight performance hit from this, as the Python interpreter is not quite as efficient in multithreaded mode as it is in single-threaded mode, even if there is only one thread running.

gil_load.test() :

Test that the code can in fact determine whether the GIL is held for your Python interpreter. Raises AssertionError on failure, returns True on success. Must be called after gil_load.init().

gil_load.start(av_sample_interval=0.005, output_interval=5, output=None, reset_counts=False):

Start monitoring the GIL. Monitoring runs in a separate thread (running only C code so as not to require the GIL itself), and checking whether the GIL is held at random times. The interval between sampling times is exponentially distributed with mean set by av_sample_interval. Over time, statistics are accumulated for what proportion of the time the GIL was held. Overall load, as well as 1 minute, 5 minute, and 15 minute exponential moving averages are computed. If output is not None, then it should be an open file (e.g sys.stdout), a filename (which will be opened in append mode), or a file descriptor. The average GIL load will be written to this file approximately every output_interval seconds. If reset_counts is True, then the accumulated statics from previous calls to start() and then stop() wil lbe cleared. If you do not clear the counts, then you can repeatedly sample the GIL usage of just a small segment of your code by wrapping it with calls to start() and stop(). Due to the exponential distribution of sampling intervals, this will accumulate accurate statistics even if the time the function takes to run is less than av_sample_interval. However, each call to start() does involve the starting of a new thread, the overhead of which may make profiling very short segments of code inaccurate.

gil_load.stop():

Stop monitoring the GIL. Accumulated statistics can then be accessed with gil_load.get()

gil_load.get():

Returns a 2-tuple:

    (total_stats, thread_stats)

Where total_stats is a dict:

    {
        'held': held,
        'held_1m': held_1m,
        'held_5m': held_5m,
        'held_15m': held_15m,
        'wait': wait,
        'wait_1m': wait_1m,
        'wait_5m': wait_5m,
        'wait_15m': wait_15m,
    }

where held is the total fraction of the time that the GIL has been held, wait is the total fraction of the time the GIL was being waited on, and the _1m, _5m and _15m suffixed entries are the 1, 5, and 15 minute exponential moving averages of the held and wait fractions.

thread_stats is a dict of the form:

    {thread_id: thread_stats}

where thread_stats is a dictionary with the same information as total_stats, but pertaining only to the given thread.

gil_load.format(stats, N=3):

Format statistics as returned by gil_load.get() for printing, with all numbers rounded to N digits. Format is:

    held: <average> (1m, 5m, 15m)
    wait: <average> (1m, 5m, 15m)
      <thread_id>
        held: <average> (1m, 5m, 15m)
        wait: <average> (1m, 5m, 15m)
      ...