Skip to content

[BUG] TMA_REDUCE_ADD does not accept tma_desciptor_ptr #2656

@tridao

Description

@tridao

Which component has the problem?

CuTe DSL

Bug Report

Describe the bug
nvidia-cutlass-dsl 4.2 supports TMA_REDUCE_ADD, but it errors when given a tma_descriptor_ptr. This means that it works for regular gemm but not for grouped gemm.

Steps/Code to reproduce bug
I took the blackwell grouped_gemm.py example, replace cpasync.CopyBulkTensorTileS2GOp() with cpasync.CopyReduceBulkTensorTileS2GOp(cute.ReductionOp.ADD). It then fails with

  File "/root/.local/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cute/nvgpu/cpasync/copy.py", line 458, in unpack
    attr = ir.Attribute.parse(attr_str)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cutlass._mlir._mlir_libs._site_initialize.<locals>.MLIRError: Unable to parse attribute:
error: "#cute_nvgpu.atom_copy_field_tmareduce<tma_descriptor_ptr>":1:13: unknown attribute `atom_copy_field_tmareduce` in dialect `cute_nvgpu

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions