-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Which component has the problem?
CUTLASS C++
Bug Report
Describe the bug
Hi, I vaguely remember getting told that there might be some problem with CUDA 12.8.0, but want to double check to make sure.
Steps/Code to reproduce bug
Run https://gist.github.com/henrylhtsang/db1cae40bdae68a5949ce2957a1399c7
Saw:
Computing CPU reference result...
Comparing CUTLASS result with CPU reference...
Position (0,0): CUTLASS=6.79688, Reference=11.5403, Diff=4.74343
Position (0,1): CUTLASS=5.07812, Reference=1.29753, Diff=3.7806
Position (0,2): CUTLASS=6.82812, Reference=-2.35956, Diff=9.18768
Position (0,3): CUTLASS=2.6543, Reference=0.975887, Diff=1.67841
Position (0,4): CUTLASS=-4.71875, Reference=3.52464, Diff=8.24339
Position (0,5): CUTLASS=2.55078, Reference=-6.25447, Diff=8.80525
Position (0,6): CUTLASS=-8.85156, Reference=5.35038, Diff=14.2019
Position (0,7): CUTLASS=0.249634, Reference=-0.31616, Diff=0.565793
Position (0,8): CUTLASS=1.69727, Reference=-7.75954, Diff=9.45681
Position (0,9): CUTLASS=-4.54297, Reference=-4.8687, Diff=0.325732
Max difference: 35.5828, Total differences: 65470
❌ CUTLASS result differs from CPU reference!
CUDA Runtime Version: 12080
Expected behavior
See
Max difference: 0, Total differences: 0
✅ CUTLASS result matches CPU reference!
Environment details (please complete the following information):
Tested with CUTLASS 4.1.0 and 3.9.2, and CUDA 12.8.0, on H100.
Additional context
NA