gpulink

A simple tool for monitoring and displaying GPU stats


Keywords
gpu, monitoring, nvidia, python
License
MIT
Install
pip install gpulink==0.6.0.1

Documentation

gpulink

Downloads PythonTest

A library and command-line tool for monitoring NVIDIA GPU stats.
gpulink uses pynvml - a Python wrapper for the NVIDIA Management Library (NVML).

Current status

โš  gpulink is in a very early state - breaking changes between versions are possible!

Requirements

gpulink requires the NVIDIA Management Library to be installed which is shipped together with nvidia-smi.

Installation

Installation using PIP

To install gpulink using the Python Package Manager (PIP) run:
pip install gpulink

Using from source

gpulink can also be used from source. For this, perform the following steps to create a Python environment and to install the requirements:

  1. Create an environment: python -m venv env
  2. Activate the environment: .\env\Scripts\Activate
  3. Install requirements: pip install -r requirements.txt

Command-line usage

gpulink can either be imported as a library or can be used from the command line:

Usage: GPU-Link: Monitor NVIDIA GPUs [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  record   Record GPU properties.
  sensors  Fetch and print the GPU sensor status.

Examples

  • View GPU sensor status: gpulink sensors
โ•’โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
โ”‚   GPU โ”‚ Name             โ”‚ Memory [MB]         โ”‚   Temp [ยฐC] โ”‚   Fan speed [%] โ”‚ Clock [MHz]   โ”‚   Power Usage [W] โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚     0 โ”‚ NVIDIA TITAN RTX โ”‚ 1809 / 25769 (7.0%) โ”‚          34 โ”‚              41 โ”‚ Graph.: 173   โ”‚            26.583 โ”‚
โ”‚       โ”‚                  โ”‚                     โ”‚             โ”‚                 โ”‚ Memory: 403   โ”‚                   โ”‚
โ”‚       โ”‚                  โ”‚                     โ”‚             โ”‚                 โ”‚ SM: 173       โ”‚                   โ”‚
โ”‚       โ”‚                  โ”‚                     โ”‚             โ”‚                 โ”‚ Video: 540    โ”‚                   โ”‚
โ•˜โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
  • Watch GPU sensor status: gpulink sensors -w

Watch sensor status

  • Record the memory usage over time, generate a plot and save it as a png image: gpulink record -o memory.png memory
โ•’โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
โ”‚ GPU โ”‚ Name             โ”‚ Memory used [MB]     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0   โ”‚ NVIDIA TITAN RTX โ”‚ minimum: 1584.754688 โ”‚
โ”‚     โ”‚                  โ”‚ maximum: 2204.585984 โ”‚
โ•˜โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
Duration:       2.500       [s]"
Sampling rate:  300.000     [Hz]"

Memory consumption over time

Library usage

gpulink can be easily used within applications. Just import gpulink and create a DeviceCtx. This context manages device access and provides an API for fetching GPU properties (see API example):

import gpulink as gpu

with gpu.DeviceCtx() as ctx:
   print(f"Available GPUs: {ctx.gpus.names}")
   memory_information = ctx.get_memory_info(gpus=ctx.gpus.ids)

Recording data

gpulink provides a Recorder class for recording GPU properties. For simple instantiation use one of the provided factory methods, e.g.:

recorder = gpu.Recorder.create_memory_recorder(ctx, ctx.gpus.ids)

Afterwards a recording can be performed:

Option 1: Using start and stop method (see Basic example)

    recorder.start()
    ... # Do some GPU stuff
    recorder.stop(auto_join=True)

Option 2: Using a context manager (see Context-Manager example)

    with recorder:
    ... # Do some GPU stuff

Option 3: Using a decorator (see Decorator example)

    @record(factory=gpu.Recorder.create_memory_recorder)
    def my_gpu_function():
    ... # Do dome GPU stuff
    
    my_gpu_function()

Once a recording is finished its data can be accessed:

recording = recording = recorder.get_recording()

Plotting data

gpulink provides a Plot class for visualizing recordings using matplotlib:

    from pathlib import Path
    
    # Generate the plot
    plot = gpu.Plot(recording)
    
    # Display the plot
    plot.plot()
    
    # Save the plot as an image
    plot.save(Path("memory.png"))
    
    # The generated Figure and Axis can also be accessed directly
    figure, axis = plot.generate_graph()

Unit testing

When using gpulink inside unit tests, create or use an already existing device mock, e.g. DeviceMock. To create a custom mock class just derive it from the BaseDevice. Then during creating a DeviceCtx provide the mock as follows:

import gpulink as gpu

with gpu.DeviceCtx(device=DeviceMock) as ctx:
   ...

Troubleshooting

  • If you get the error message below, please ensure that the NVIDIA Management Library is installed on you system by typing nvidia-smi --version into a terminal:
    pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found.

Planned features

  • Live-plotting of GPU stats

Changelog

  • 0.4.0
    • Recording arbitrary GPU stats (clock, fan-speed, memory, power-usage, temp)
    • Display GPU name and power usage within sensors command
    • Replaced arparse library by click
    • Aborting a watch or recording command can be done by pressing any key instead of ctrl+c
  • 0.4.1
    • Fix error when calling nvmlDeviceGetName in pynvml version 11.5.0
  • 0.5.0
    • Add context-manager-based recording
    • Add decorator-based recording
  • 0.6.0
    • Remove PlotOptions class
    • Fix imports and update unit tests