-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Description
Hi CUTLASS team,
I'm encountering errors when building and running the following SM120 examples from cutlass/examples on SM120 GPU:
- 79_blackwell_geforce_gemm
Build command:
nvcc -std=c++17 -gencode=arch=compute_120,code=sm_120 \
-I /cutlass/include -I /cutlass/tools/util/include -I /cutlass/examples/common \
-o /cutlass/examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm \
/cutlass/examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm.cu -lcuda
Run command:
./examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm --m=2048 --n=2048 --k=2048
Runtime error:
ERROR : Arch conditional MMA instruction used without targeting appropriate compute capability. Aborting.
- 80_blackwell_geforce_sparse_gemm
Build command:
nvcc -std=c++17 -gencode=arch=compute_120,code=sm_120 \
-I /cutlass/include -I /cutlass/tools/util/include -I /cutlass/examples/common \
-o /cutlass/examples/80_blackwell_geforce_sparse_gemm/80b_blackwell_geforce_nvfp4_nvfp4_sparse_gemm \
/cutlass/examples/80_blackwell_geforce_sparse_gemm/80b_blackwell_geforce_nvfp4_nvfp4_sparse_gemm.cu -lcuda
Run command:
./examples/80_blackwell_geforce_sparse_gemm/80a_blackwell_geforce_mxfp8_bf16_sparse_gemm --m=1024 --n=1024 --k=1024
Run errors:
ptxas ... error : Feature '.kind::mxf8f6f4' not supported on .target 'sm_120'
ptxas ... error : Feature '.block_scale' not supported on .target 'sm_120'
ptxas ... error : Feature '.scale_vec::1X' not supported on .target 'sm_120'
ptxas ... error : Instruction 'mma with block scale' not supported on .target 'sm_120'
ptxas fatal : Ptx assembly aborted due to errors
Additional Information
I can run ./tools/profiler/cutlass_profiler .... on sm120 machine without error, while these examples would fail. My CUDA toolkit version is 12.9 and my machine is sm120 arch. Both examples are supposed to target the Blackwell SM120 architecture.
Could you please advise if there are any additional requirements, known issues, or workarounds for running these examples on SM120? Is there a specific CUDA version, driver, or CUTLASS branch required for these kernels to work on Blackwell GPUs?
Thank you!