Skip to content

FluidNumerics/gpu-healthchecks

Repository files navigation

GPU Health Check Dashboard

Copyright 2025 Fluid Numerics

Installation

Create and activate conda env

conda create -n gpu-healthchecks python=3.11
conda activate gpu-healthchecks
pip install -r requirements.txt

Build rocm-amdgpu-bench:

mkdir rocm-amdgpu-bench/build
cd rocm-amdgpu-bench/build
cmake ..
make

Running

To run during an interactive job, just run

python benchmark_gpus.py

To submit a batch job on a single node (e.g., nicholson), we provide an example batch script at slurm/nicholson.sh.

[!WARNING] gfx950 is not supported before ROCm 6.4.0. If using a ROCm version < 6.4.0, remove the gfx950 target on line 25 in rocm-amdgpu-bench/CMakeLists.txt.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published