-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Which component has the problem?
CUTLASS C++
Bug Report
Describe the bug
Hi. I noticed I can get into CUDA IMA in backward when testing Example 77 on Blackwell machine (B200 / GB200). I seem to be able some errors in compute-sanitizer in racecheck and synccheck mode. Note that in the default compute-sanitizer mode it seems fine.
tldr of errors:
Error: Race reported between Write access at void cutlass::device_kernel<cutlass::fmha::kernel::Sm100FmhaBwdKernelTmaWarpSpecialized...
========= Barrier error detected. Missing wait.
========= at void cutlass::device_kernel<cutlass::fmha::kernel::Sm100FmhaBwdKernelTmaWarpSpecialized
Please let me know if my setup is correct, and if this is indeed an issue.
Steps/Code to reproduce bug
mkdir build
cd build
cmake .. -DCUTLASS_NVCC_ARCHS=100a
targets=(
test_examples_77_blackwell_fmha_bwd_fp16_test_basic
test_examples_77_blackwell_fmha_bwd_fp16_test_varlen
)
for test in "${targets[@]}"
do
echo "Running $test" 2>&1 | tee -a ~/cutlass/output.log
make $test 2>&1 | tee -a ~/cutlass/output.log
echo "Running compute sanitizer $test" 2>&1 | tee -a ~/cutlass/output.log
compute-sanitizer make $test 2>&1 | tee -a ~/cutlass/output.log
echo "Running compute sanitizer memcheck $test" 2>&1 | tee -a ~/cutlass/output.log
compute-sanitizer --tool=memcheck make $test 2>&1 | tee -a ~/cutlass/output.log
echo "Running compute sanitizer racecheck $test" 2>&1 | tee -a ~/cutlass/output.log
compute-sanitizer --tool=racecheck make $test 2>&1 | tee -a ~/cutlass/output.log
echo "Running compute sanitizer synccheck $test" 2>&1 | tee -a ~/cutlass/output.log
compute-sanitizer --tool=synccheck make $test 2>&1 | tee -a ~/cutlass/output.log
done
cmake logs: https://gist.github.com/henrylhtsang/f6c02c62008d6deaf36dc71ac2dfb0ac
compute sanitizer logs: https://gist.github.com/henrylhtsang/3938ab718a1c9c31863e858a6d0060ca
Expected behavior
No errors from compute sanitizer.
Environment details (please complete the following information):
- cutlass commit: 56f0718
- B200
- CUDA toolkit V12.8.93
- Driver Version: 570.124.06
Additional context
cc @sryap