Skip to content

[QST]Errors When Building and Running Blackwell SM120 Examples #2451

@jianyingzhu

Description

@jianyingzhu

Hi CUTLASS team,

I'm encountering errors when building and running the following SM120 examples from cutlass/examples on SM120 GPU:

  1. 79_blackwell_geforce_gemm
    Build command:
nvcc -std=c++17 -gencode=arch=compute_120,code=sm_120 \
  -I /cutlass/include -I /cutlass/tools/util/include -I /cutlass/examples/common \
  -o /cutlass/examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm \
  /cutlass/examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm.cu -lcuda

Run command:

./examples/79_blackwell_geforce_gemm/79a_blackwell_geforce_nvfp4_bf16_gemm --m=2048 --n=2048 --k=2048

Runtime error:

ERROR : Arch conditional MMA instruction used without targeting appropriate compute capability. Aborting.
  1. 80_blackwell_geforce_sparse_gemm
    Build command:
nvcc -std=c++17 -gencode=arch=compute_120,code=sm_120 \
  -I /cutlass/include -I /cutlass/tools/util/include -I /cutlass/examples/common \
  -o /cutlass/examples/80_blackwell_geforce_sparse_gemm/80b_blackwell_geforce_nvfp4_nvfp4_sparse_gemm \
/cutlass/examples/80_blackwell_geforce_sparse_gemm/80b_blackwell_geforce_nvfp4_nvfp4_sparse_gemm.cu -lcuda

Run command:

./examples/80_blackwell_geforce_sparse_gemm/80a_blackwell_geforce_mxfp8_bf16_sparse_gemm --m=1024 --n=1024 --k=1024

Run errors:

ptxas ... error   : Feature '.kind::mxf8f6f4' not supported on .target 'sm_120'
ptxas ... error   : Feature '.block_scale' not supported on .target 'sm_120'
ptxas ... error   : Feature '.scale_vec::1X' not supported on .target 'sm_120'
ptxas ... error   : Instruction 'mma with block scale' not supported on .target 'sm_120'
ptxas fatal   : Ptx assembly aborted due to errors

Additional Information
I can run ./tools/profiler/cutlass_profiler .... on sm120 machine without error, while these examples would fail. My CUDA toolkit version is 12.9 and my machine is sm120 arch. Both examples are supposed to target the Blackwell SM120 architecture.

Could you please advise if there are any additional requirements, known issues, or workarounds for running these examples on SM120? Is there a specific CUDA version, driver, or CUTLASS branch required for these kernels to work on Blackwell GPUs?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions