Because sometimes burning the GPU is not enough.
A simple tool for running compute or memory intensive workloads on both CPU and GPU, in order to understand
- the performance of the individual components
 - the impact of the workloads on one another
 - the maximum power consumption of a node
 
Node burn can run GEMM or STREAM workloads on the GPU only, CPU only, or both simultaneously.
# run GEMM with matrix dimension 5000*5000 on the GPU,
# and STREAM triad with length 500000 on the CPU, for 30 seconds.
./burn -ggemm,5000 -cstream,500000 -d30
# run GEMM on the CPU, nothing on the GPU, for 3 minutes.
./burn -cgemm,5000 -d180
# run GEMM on the GPU, nothing on the CPU, for 20 seconds.
./burn -ggemm,5000 -d20Sometimes we want to run multiple instances of node burn in a parallel job, e.g. 4 instances on a node with 4 GPUs, or one instance on every GPU in a cabinet, to see if anything catches on fire understand system behavior under load.
Use the --batch option to produce less verbose output that can be easily parsed by a post-processing script.
# on a system with 4 GPUs per node, use all 16 GPUs on 4 nodes to
# run GEMM with matrix dimension 10000*10000 on the GPU for 30 seconds
srun -n16 -N4 --gpus-per-task=1 ./burn --batch -ggemm,10000 -d30
nid001272:gpu    584 iterations, 38930.92 GFlops,     30.0 seconds,    2.400 Gbytes
nid001272:gpu    579 iterations, 38555.99 GFlops,     30.0 seconds,    2.400 Gbytes
nid001272:gpu    561 iterations, 37348.14 GFlops,     30.0 seconds,    2.400 Gbytes
nid001272:gpu    600 iterations, 39939.47 GFlops,     30.0 seconds,    2.400 Gbytes
nid001278:gpu    585 iterations, 38994.35 GFlops,     30.0 seconds,    2.400 Gbytes
nid001278:gpu    584 iterations, 38914.98 GFlops,     30.0 seconds,    2.400 Gbytes
nid001278:gpu    589 iterations, 39200.59 GFlops,     30.1 seconds,    2.400 Gbytes
nid001278:gpu    589 iterations, 39204.37 GFlops,     30.0 seconds,    2.400 Gbytes
nid001274:gpu    557 iterations, 37091.74 GFlops,     30.0 seconds,    2.400 Gbytes
nid001274:gpu    560 iterations, 37289.96 GFlops,     30.0 seconds,    2.400 Gbytes
nid001274:gpu    542 iterations, 36090.85 GFlops,     30.0 seconds,    2.400 Gbytes
nid001274:gpu    503 iterations, 33473.36 GFlops,     30.1 seconds,    2.400 Gbytes
nid001276:gpu    584 iterations, 38929.67 GFlops,     30.0 seconds,    2.400 Gbytes
nid001276:gpu    589 iterations, 39253.24 GFlops,     30.0 seconds,    2.400 Gbytes
nid001276:gpu    588 iterations, 39170.08 GFlops,     30.0 seconds,    2.400 Gbytes
nid001276:gpu    589 iterations, 39224.21 GFlops,     30.0 seconds,    2.400 GbytesIf running on a HPE Cray-EX system with pm_counters, nodeburn can be configured to generate a report of power consumption on each node. Enable it at build time with the NB_PMCOUNTERS CMake option (see below).
node-burn will generate power reports from all of the energy counters that it can detect on each node - the values reported will vary according to the node architecture.
C++20 for the C++ code, C++17 for the CUDA code.
It has only been tested with GCC 11+ and CUDA 11.8+.
- not tested with Clang, Intel, NVC, Cray compilers. It should work if the compiler is recent.
 
Node burn uses CMake to configure the build. There is currently one option, NB_GPU which can be used to to disable CUDA targets.
# by default node burn will attempt to build for CUDA devices.
CC=gcc CXX=g++ cmake $src_path
# explicitly disable building for CUDA
CC=gcc CXX=g++ cmake $src_path -DNB_GPU=offOn HPE Cray-EX systems, power readings from pm_counters can be generated using the NB_PMCOUNTERS option.
# enable pm counters for average power consumption
CC=gcc CXX=g++ cmake $src_path -DNB_PMCOUNTERS=on