GPU Memory Latency and Bandwidth Benchmark

This project provides scripts to benchmark GPU memory latency and bandwidth using PyTorch. The results are visualized in the form of plots for different GPU models.

Methodology

Timing is done using CUDA event timers to ensure accurate GPU-side measurements. The operation is repeated several times, and the mean and standard deviation are reported.
Measurements are performed for a range of tensor sizes (from a few bytes up to several GiB).
Results are averaged over several runs.

Latency Measurement

Write Latency:
- Performance is checked with the tensor.fill_ method.
- For maximum bitflips, the byte-tensor is filled with 11111111 (255) after being initialized to 00000000 (0).
Read Latency:
- For minimum bitflips, the tensor is filled with 00000000 (0) after already being set to 00000000 (0).
- The contents of the tensor are read by copying it to another tensor (outp_tensor.copy_(inp_tensor)), ensuring all elements are accessed.

Bandwidth Calculation

For each memory operation, bandwidth is calculated as the reciprocal of the latency:
- Bandwidth (GiB/s) = Tensor Size (GiB) / Latency (s)
Standard deviation is propagated from the latency measurements.

Example Results

Below are the plots generated for two NVIDIA GPUs present on Snellius:

NVIDIA A100-SXM4-40GB

NVIDIA H100-SXM5-94GB

Usage

Run the benchmark script (e.g., for latency and bandwidth):

python test-memory-latency-bandwidth.py

Results will be printed to the console and saved as a plot PNG file named after the detected GPU.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
NVIDIA-A100-SXM4-40GB.png		NVIDIA-A100-SXM4-40GB.png
NVIDIA-H100.png		NVIDIA-H100.png
README.md		README.md
requirements.txt		requirements.txt
test-memory-latency-bandwidth.py		test-memory-latency-bandwidth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPU Memory Latency and Bandwidth Benchmark

Methodology

Latency Measurement

Bandwidth Calculation

Example Results

NVIDIA A100-SXM4-40GB

NVIDIA H100-SXM5-94GB

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SURF-ML/gpu-perf-check

Folders and files

Latest commit

History

Repository files navigation

GPU Memory Latency and Bandwidth Benchmark

Methodology

Latency Measurement

Bandwidth Calculation

Example Results

NVIDIA A100-SXM4-40GB

NVIDIA H100-SXM5-94GB

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages