Skip to content

Commit 2db0292

Browse files
authored
Add MI300X specs to roofline benchmark (#1913)
* Add MI300X specs to roofline benchmark * Fix shape mismatch error by correcting grad_output shape * Adjust benchmark command in README.md
1 parent 9ef2f06 commit 2db0292

File tree

3 files changed

+15
-2
lines changed

3 files changed

+15
-2
lines changed

benchmarks/float8/float8_roofline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,7 @@ def run(
372372
).requires_grad_()
373373

374374
# get the gradient of the right shape
375-
grad_output = torch.randn(N_val, K_val, dtype=torch.bfloat16, device="cuda")
375+
grad_output = torch.randn(M_val, N_val, dtype=torch.bfloat16, device="cuda")
376376

377377
# get the bf16 gpu kernel time
378378
torch._dynamo.reset()

torchao/float8/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Example 2 (large shapes):
139139
To reproduce the raw data for table above, you can run the following script
140140

141141
```lang=shell
142-
python benchmarks/float8/float8_roofline.py your_output_filename.csv --gemm_time_strategy benchmarks --shape_gen_name sweep
142+
python benchmarks/float8/float8_roofline.py your_output_filename.csv --shape_gen_name sweep
143143
```
144144

145145
## Derivation

torchao/testing/float8/roofline_utils.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,19 @@
4141
# TODO(future): measure once we have the hardware
4242
"pct_achievable_mem_bw": 0.92,
4343
},
44+
"AMD Instinct MI300X": {
45+
# https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/data-sheets/amd-instinct-mi300x-data-sheet.pdf, page 1,
46+
"bf16_peak_tops": 1307e12,
47+
"fp8_peak_tops": 2614e12,
48+
# 5.3 TB per second
49+
"peak_mem_bw_bytes_sec": 5.3e12,
50+
# for now, copy over from H100
51+
# TODO(future): run measurement on hardware
52+
"pct_achievable_gemm_tops": 0.78,
53+
# for now, copy over from H100
54+
# TODO(future): run measurement on hardware
55+
"pct_achievable_mem_bw": 0.92,
56+
},
4457
# TODO(future): more GPU names
4558
}
4659

0 commit comments

Comments
 (0)