Replies: 2 comments 5 replies
-
Not seeing much to indicate the fallback is any faster than standard cuE (v0.0.2). Modified code and first 5 outputs from uvu benchmark (two MACE sizes and a few Nequip configurations) below, A100-SXM-80GB.
Output trimmed to remove some unecessary lines:
python test/benchmark.py -o outputs/uvu uvu --plot -i cue -d forward
2025-03-05 21:48:54,050 - INFO - Config: ChannelwiseTPP(128x0e+128x1o+128x2e x 1x0e+1x1o+1x2e+1x3o) -> 384x0e+640x1o+640x2e+512x3o
2025-03-05 21:49:40,139 - INFO - Avg. Throughput: 1624.90 ± 37.64 GFLOPS
2025-03-05 21:49:40,139 - INFO - Avg. Bandwidth : 881.67 ± 20.42 GBPS
2025-03-05 21:49:40,172 - INFO - Finished Test ID: 0
2025-03-05 21:49:40,172 - INFO - Starting Test ID: 1
2025-03-05 21:49:40,172 - INFO - Config: ChannelwiseTPP(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o) -> 256x0e+384x1o+384x2e
2025-03-05 21:49:41,498 - INFO - Avg. Throughput: 1055.03 ± 2.87 GFLOPS
2025-03-05 21:49:41,498 - INFO - Avg. Bandwidth : 804.46 ± 2.19 GBPS
2025-03-05 21:49:41,511 - INFO - Finished Test ID: 1
2025-03-05 21:49:41,511 - INFO - Starting Test ID: 2
2025-03-05 21:49:41,511 - INFO - Config: ChannelwiseTPP(32x0o+32x0e+32x1o+32x1e+32x2o+32x2e x 1x0e+1x1o+1x2e) -> 96x0o+96x0e+192x1o+192x1e+192x2o+192x2e
2025-03-05 21:49:43,096 - INFO - Avg. Throughput: 1335.48 ± 8.21 GFLOPS
2025-03-05 21:49:43,096 - INFO - Avg. Bandwidth : 868.82 ± 5.34 GBPS
2025-03-05 21:49:43,109 - INFO - Finished Test ID: 2
2025-03-05 21:49:43,109 - INFO - Starting Test ID: 3
2025-03-05 21:49:43,109 - INFO - Config: ChannelwiseTPP(64x0o+64x0e+64x1o+64x1e x 1x0e+1x1o) -> 128x0o+128x0e+192x1o+192x1e
2025-03-05 21:49:44,669 - INFO - Avg. Throughput: 571.48 ± 29.99 GFLOPS
2025-03-05 21:49:44,669 - INFO - Avg. Bandwidth : 776.10 ± 40.72 GBPS
2025-03-05 21:49:44,705 - INFO - Finished Test ID: 3
2025-03-05 21:49:44,705 - INFO - Starting Test ID: 4
2025-03-05 21:49:44,705 - INFO - Config: ChannelwiseTPP(64x0o+64x0e+64x1o+64x1e+64x2o+64x2e x 1x0e+1x1o+1x2e) -> 192x0o+192x0e+384x1o+384x1e+384x2o+384x2e
2025-03-05 21:49:46,176 - INFO - Avg. Throughput: 1517.64 ± 13.34 GFLOPS
2025-03-05 21:49:46,176 - INFO - Avg. Bandwidth : 986.41 ± 8.67 GBPS
(cue_env) vbharadw@nid200273:/global/cfs/projectdirs/m1982/vbharadw/equivariant_spmm> python test/benchmark.py -o outputs/uvu_fallback uvu --plot -i cue -d
forward
2025-03-05 21:54:06,062 - INFO - Starting Test ID: 0
2025-03-05 21:54:06,062 - INFO - Config: ChannelwiseTPP(128x0e+128x1o+128x2e x 1x0e+1x1o+1x2e+1x3o) -> 384x0e+640x1o+640x2e+512x3o
2025-03-05 21:55:12,546 - INFO - Avg. Throughput: 358.82 ± 1.43 GFLOPS
2025-03-05 21:55:12,546 - INFO - Avg. Bandwidth : 194.70 ± 0.78 GBPS
2025-03-05 21:55:12,558 - INFO - Finished Test ID: 0
2025-03-05 21:55:12,558 - INFO - Starting Test ID: 1
2025-03-05 21:55:12,559 - INFO - Config: ChannelwiseTPP(128x0e+128x1o x 1x0e+1x1o+1x2e+1x3o) -> 256x0e+384x1o+384x2e
2025-03-05 21:55:16,010 - INFO - Avg. Throughput: 217.33 ± 0.44 GFLOPS
2025-03-05 21:55:16,010 - INFO - Avg. Bandwidth : 165.72 ± 0.34 GBPS
2025-03-05 21:55:16,022 - INFO - Finished Test ID: 1
2025-03-05 21:55:16,022 - INFO - Starting Test ID: 2
2025-03-05 21:55:16,022 - INFO - Config: ChannelwiseTPP(32x0o+32x0e+32x1o+32x1e+32x2o+32x2e x 1x0e+1x1o+1x2e) -> 96x0o+96x0e+192x1o+192x1e+192x2o+192x2e
2025-03-05 21:55:23,200 - INFO - Avg. Throughput: 169.93 ± 1.00 GFLOPS
2025-03-05 21:55:23,200 - INFO - Avg. Bandwidth : 110.55 ± 0.65 GBPS
2025-03-05 21:55:23,211 - INFO - Finished Test ID: 2
2025-03-05 21:55:23,211 - INFO - Starting Test ID: 3
2025-03-05 21:55:23,211 - INFO - Config: ChannelwiseTPP(64x0o+64x0e+64x1o+64x1e x 1x0e+1x1o) -> 128x0o+128x0e+192x1o+192x1e
2025-03-05 21:55:25,670 - INFO - Avg. Throughput: 167.94 ± 1.84 GFLOPS
2025-03-05 21:55:25,671 - INFO - Avg. Bandwidth : 228.07 ± 2.50 GBPS
2025-03-05 21:55:25,686 - INFO - Finished Test ID: 3
2025-03-05 21:55:25,686 - INFO - Starting Test ID: 4
2025-03-05 21:55:25,686 - INFO - Config: ChannelwiseTPP(64x0o+64x0e+64x1o+64x1e+64x2o+64x2e x 1x0e+1x1o+1x2e) -> 192x0o+192x0e+384x1o+384x1e+384x2o+384x2e
2025-03-05 21:55:34,037 - INFO - Avg. Throughput: 196.71 ± 0.25 GFLOPS
2025-03-05 21:55:34,038 - INFO - Avg. Bandwidth : 127.85 ± 0.16 GBPS
2025-03-05 21:55:34,094 - INFO - Finished Test ID: 4 |
Beta Was this translation helpful? Give feedback.
5 replies
-
Batch size 50K. See
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
#51 for context
Beta Was this translation helpful? Give feedback.
All reactions