-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Description
Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
nvidia-cutlass-dsl 4.2 supports TMA_REDUCE_ADD, but it errors when given a tma_descriptor_ptr
. This means that it works for regular gemm but not for grouped gemm.
Steps/Code to reproduce bug
I took the blackwell grouped_gemm.py example, replace cpasync.CopyBulkTensorTileS2GOp()
with cpasync.CopyReduceBulkTensorTileS2GOp(cute.ReductionOp.ADD)
. It then fails with
File "/root/.local/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/cpasync/copy.py", line 458, in unpack
attr = ir.Attribute.parse(attr_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cutlass._mlir._mlir_libs._site_initialize.<locals>.MLIRError: Unable to parse attribute:
error: "#cute_nvgpu.atom_copy_field_tmareduce<tma_descriptor_ptr>":1:13: unknown attribute `atom_copy_field_tmareduce` in dialect `cute_nvgpu
milesvant