Skip to content

[FEA] Complete the cutlass::library::GemmDescription class to cover Hopper GEMM kernels #2073

@manishucsd

Description

@manishucsd

Issue

CUTLASS GemmDevice Operator contains compile-time attributes (functional and performance attribute). The GemmDevice Operator is consumed by GemmOperation[3xBase]. In the past, I have found some of the values in this data structure incorrect and sometimes just completely missing.

For e.g. given the kernel by its full procedural name = cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_bf16_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_cooperative_epi_tma.

This kernel has the following functional and performance attribute:

Functional Attribute

  • dtypeA_dtypeB_dtypeAccumulation_dtypeC_dtypeD : e4m3_e4m3_f32_bf16_bf16
  • RowMajor_ColumnMajor_ColumnMajor : tnn

Performance Attribute

  • Instruction Shape : 64x128x32
  • Threadblock Shape : 128x128x128
  • Cluster Shape : 1x2x1
  • Mainloop Kind : warpspecialized_cooperative // This is missing from GemmDescription
  • Epilogue Kind : epi_tma // This is missing from GemmDescription
  • AccumulationKind : "fastaccum" or default "" // This is missing from GemmDescription

I have added a test so someone at NVIDIA can start on this. Can you please uncomment the two lines, add whatever is needed to fix this?

You can follow any other enum that is lifted up to GemmDevice Operator from internal templates and used to set the data members of GemmDescription class.

We can then commit this test and add more for Hopper and make sure this class is also covered for Blackwell. The tests are CPU-only and should not take too much time in the CI, this will allow us to catch bugs like this one.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions