[BUG] TMA_REDUCE_ADD does not accept tma_desciptor_ptr

### Which component has the problem?

CuTe DSL

### Bug Report

**Describe the bug**
nvidia-cutlass-dsl 4.2 supports TMA_REDUCE_ADD, but it errors when given a `tma_descriptor_ptr`. This means that it works for regular gemm but not for grouped gemm.

**Steps/Code to reproduce bug**
I took the blackwell grouped_gemm.py example, replace `cpasync.CopyBulkTensorTileS2GOp()` with `cpasync.CopyReduceBulkTensorTileS2GOp(cute.ReductionOp.ADD)`. It then fails with
```
  File "/root/.local/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/cpasync/copy.py", line 458, in unpack
    attr = ir.Attribute.parse(attr_str)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cutlass._mlir._mlir_libs._site_initialize.<locals>.MLIRError: Unable to parse attribute:
error: "#cute_nvgpu.atom_copy_field_tmareduce<tma_descriptor_ptr>":1:13: unknown attribute `atom_copy_field_tmareduce` in dialect `cute_nvgpu
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] TMA_REDUCE_ADD does not accept tma_desciptor_ptr #2656

Which component has the problem?

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] TMA_REDUCE_ADD does not accept tma_desciptor_ptr #2656

Description

Which component has the problem?

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions