Skip to content

A tool for monitoring and displaying NVIDIA GPU stats

License

Notifications You must be signed in to change notification settings

PhilipKlaus/gpulink

Repository files navigation

gpulink

Downloads PythonTest

A library and command-line tool for monitoring NVIDIA GPU stats.
gpulink uses pynvml - a Python wrapper for the NVIDIA Management Library (NVML).

Current status

⚠ gpulink is in a very early state - breaking changes between versions are possible!

Requirements

gpulink requires the NVIDIA Management Library to be installed which is shipped together with nvidia-smi.

Installation

Installation using PIP

To install gpulink using the Python Package Manager (PIP) run:
pip install gpulink

Using from source

gpulink can also be used from source. For this, perform the following steps to create a Python environment and to install the requirements:

  1. Create an environment: python -m venv env
  2. Activate the environment: .\env\Scripts\Activate
  3. Install requirements: pip install -r requirements.txt

Command-line usage

gpulink can either be imported as a library or can be used from the command line:

Usage: GPU-Link: Monitor NVIDIA GPUs [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  record   Record GPU properties.
  sensors  Fetch and print the GPU sensor status.

Examples

  • View GPU sensor status: gpulink sensors
╒═══════╤══════════════════╤═════════════════════╤═════════════╤═════════════════╤═══════════════╤═══════════════════╕
│   GPU │ Name             │ Memory [MB]         │   Temp [°C] │   Fan speed [%] │ Clock [MHz]   │   Power Usage [W] │
╞═══════╪══════════════════╪═════════════════════╪═════════════╪═════════════════╪═══════════════╪═══════════════════╡
│     0 │ NVIDIA TITAN RTX │ 1809 / 25769 (7.0%) │          34 │              41 │ Graph.: 173   │            26.583 │
│       │                  │                     │             │                 │ Memory: 403   │                   │
│       │                  │                     │             │                 │ SM: 173       │                   │
│       │                  │                     │             │                 │ Video: 540    │                   │
╘═══════╧══════════════════╧═════════════════════╧═════════════╧═════════════════╧═══════════════╧═══════════════════╛
  • Watch GPU sensor status: gpulink sensors -w

Watch sensor status

  • Record the memory usage over time, generate a plot and save it as a png image: gpulink record -o memory.png memory
╒═════╤══════════════════╤══════════════════════╕
│ GPU │ Name             │ Memory used [MB]     │
├─────┼──────────────────┼──────────────────────┤
│ 0   │ NVIDIA TITAN RTX │ minimum: 1584.754688 │
│     │                  │ maximum: 2204.585984 │
╘═════╧══════════════════╧══════════════════════╛
Duration:       2.500       [s]"
Sampling rate:  300.000     [Hz]"

Memory consumption over time

Library usage

gpulink can be easily used within applications. Just import gpulink and create a DeviceCtx. This context manages device access and provides an API for fetching GPU properties (see API example):

import gpulink as gpu

with gpu.DeviceCtx() as ctx:
   print(f"Available GPUs: {ctx.gpus.names}")
   memory_information = ctx.get_memory_info(gpus=ctx.gpus.ids)

Recording data

gpulink provides a Recorder class for recording GPU properties. For simple instantiation use one of the provided factory methods, e.g.:

recorder = gpu.Recorder.create_memory_recorder(ctx, ctx.gpus.ids)

Afterwards a recording can be performed:

Option 1: Using start and stop method (see Basic example)

    recorder.start()
    ... # Do some GPU stuff
    recorder.stop(auto_join=True)

Option 2: Using a context manager (see Context-Manager example)

    with recorder:
    ... # Do some GPU stuff

Option 3: Using a decorator (see Decorator example)

    @record(factory=gpu.Recorder.create_memory_recorder)
    def my_gpu_function():
    ... # Do dome GPU stuff
    
    my_gpu_function()

Once a recording is finished its data can be accessed:

recording = recording = recorder.get_recording()

Plotting data

gpulink provides a Plot class for visualizing recordings using matplotlib:

    from pathlib import Path
    
    # Generate the plot
    plot = gpu.Plot(recording)
    
    # Display the plot
    plot.plot()
    
    # Save the plot as an image
    plot.save(Path("memory.png"))
    
    # The generated Figure and Axis can also be accessed directly
    figure, axis = plot.generate_graph()

Unit testing

When using gpulink inside unit tests, create or use an already existing device mock, e.g. DeviceMock. To create a custom mock class just derive it from the BaseDevice. Then during creating a DeviceCtx provide the mock as follows:

import gpulink as gpu

with gpu.DeviceCtx(device=DeviceMock) as ctx:
   ...

Troubleshooting

  • If you get the error message below, please ensure that the NVIDIA Management Library is installed on you system by typing nvidia-smi --version into a terminal:
    pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found.

Planned features

  • Live-plotting of GPU stats

Changelog

  • 0.4.0
    • Recording arbitrary GPU stats (clock, fan-speed, memory, power-usage, temp)
    • Display GPU name and power usage within sensors command
    • Replaced arparse library by click
    • Aborting a watch or recording command can be done by pressing any key instead of ctrl+c
  • 0.4.1
    • Fix error when calling nvmlDeviceGetName in pynvml version 11.5.0
  • 0.5.0
    • Add context-manager-based recording
    • Add decorator-based recording
  • 0.6.0
    • Remove PlotOptions class
    • Fix imports and update unit tests

About

A tool for monitoring and displaying NVIDIA GPU stats

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages