gpulink

A simple tool for monitoring and displaying GPU stats


Keywords
gpu, monitoring, nvidia, python
License
MIT
Install
pip install gpulink==0.2.0

Documentation

gpulink

Downloads PythonTest

A library and command-line tool for monitoring NVIDIA GPU stats.
gpulink uses pynvml - a Python wrapper for the NVIDIA Management Library (NVML).

Current status

⚠ gpulink is in a very early state - breaking changes between versions are possible!

Requirements

gpulink requires the NVIDIA Management Library to be installed which is shipped together with nvidia-smi.

Installation

Installation using PIP

To install gpulink using the Python Package Manager (PIP) run:
pip install gpulink

Using from source

gpulink can also be used from source. For this, perform the following steps to create a Python environment and to install the requirements:

  1. Create an environment: python -m venv env
  2. Activate the environment: .\env\Scripts\Activate
  3. Install requirements: pip install -r requirements.txt

Command-line usage

gpulink can either be imported as a library or can be used from the command line:

Usage: GPU-Link: Monitor NVIDIA GPUs [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  record   Record GPU properties.
  sensors  Fetch and print the GPU sensor status.

Examples

  • View GPU sensor status: gpulink sensors
╒═══════╀══════════════════╀═════════════════════╀═════════════╀═════════════════╀═══════════════╀═══════════════════╕
β”‚   GPU β”‚ Name             β”‚ Memory [MB]         β”‚   Temp [Β°C] β”‚   Fan speed [%] β”‚ Clock [MHz]   β”‚   Power Usage [W] β”‚
β•žβ•β•β•β•β•β•β•β•ͺ══════════════════β•ͺ═════════════════════β•ͺ═════════════β•ͺ═════════════════β•ͺ═══════════════β•ͺ═══════════════════║
β”‚     0 β”‚ NVIDIA TITAN RTX β”‚ 1809 / 25769 (7.0%) β”‚          34 β”‚              41 β”‚ Graph.: 173   β”‚            26.583 β”‚
β”‚       β”‚                  β”‚                     β”‚             β”‚                 β”‚ Memory: 403   β”‚                   β”‚
β”‚       β”‚                  β”‚                     β”‚             β”‚                 β”‚ SM: 173       β”‚                   β”‚
β”‚       β”‚                  β”‚                     β”‚             β”‚                 β”‚ Video: 540    β”‚                   β”‚
β•˜β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
  • Watch GPU sensor status: gpulink sensors -w

Watch sensor status

  • Record the memory usage over time, generate a plot and save it as a png image: gpulink record -o memory.png memory
╒═════╀══════════════════╀══════════════════════╕
β”‚ GPU β”‚ Name             β”‚ Memory used [MB]     β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0   β”‚ NVIDIA TITAN RTX β”‚ minimum: 1584.754688 β”‚
β”‚     β”‚                  β”‚ maximum: 2204.585984 β”‚
β•˜β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
Duration:       2.500       [s]"
Sampling rate:  300.000     [Hz]"

Memory consumption over time

Library usage

gpulink can be easily used within applications. Just import gpulink and create a DeviceCtx. This context manages device access and provides an API for fetching GPU properties (see API example):

import gpulink as gpu

with gpu.DeviceCtx() as ctx:
   print(f"Available GPUs: {ctx.gpus.names}")
   memory_information = ctx.get_memory_info(gpus=ctx.gpus.ids)

Recording data

gpulink provides a Recorder class for recording GPU properties. For simple instantiation use one of the provided factory methods, e.g.:

recorder = gpu.Recorder.create_memory_recorder(ctx, ctx.gpus.ids)

Afterwards a recording can be performed:

Option 1: Using start and stop method (see Basic example)

    recorder.start()
    ... # Do some GPU stuff
    recorder.stop(auto_join=True)

Option 2: Using a context manager (see Context-Manager example)

    with recorder:
    ... # Do some GPU stuff

Option 3: Using a decorator (see Decorator example)

    @record(factory=gpu.Recorder.create_memory_recorder)
    def my_gpu_function():
    ... # Do dome GPU stuff
    
    my_gpu_function()

Once a recording is finished its data can be accessed:

recording = recording = recorder.get_recording()

Plotting data

gpulink provides a Plot class for visualizing recordings using matplotlib:

    from pathlib import Path
    
    # Generate the plot
    plot = gpu.Plot(recording)
    
    # Display the plot
    plot.plot()
    
    # Save the plot as an image
    plot.save(Path("memory.png"))
    
    # The generated Figure and Axis can also be accessed directly
    figure, axis = plot.generate_graph()

Unit testing

When using gpulink inside unit tests, create or use an already existing device mock, e.g. DeviceMock. To create a custom mock class just derive it from the BaseDevice. Then during creating a DeviceCtx provide the mock as follows:

import gpulink as gpu

with gpu.DeviceCtx(device=DeviceMock) as ctx:
   ...

Troubleshooting

  • If you get the error message below, please ensure that the NVIDIA Management Library is installed on you system by typing nvidia-smi --version into a terminal:
    pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found.

Planned features

  • Live-plotting of GPU stats

Changelog

  • 0.4.0
    • Recording arbitrary GPU stats (clock, fan-speed, memory, power-usage, temp)
    • Display GPU name and power usage within sensors command
    • Replaced arparse library by click
    • Aborting a watch or recording command can be done by pressing any key instead of ctrl+c
  • 0.4.1
    • Fix error when calling nvmlDeviceGetName in pynvml version 11.5.0
  • 0.5.0
    • Add context-manager-based recording
    • Add decorator-based recording
  • 0.6.0
    • Remove PlotOptions class
    • Fix imports and update unit tests