[QST] How to apply StreamK to hopper warp specialized GEMM

**What is your question?**
I'm trying to apply `StreamK` or `SplitK` to a hopper warp specialized GEMM.
you can see the full code  [here](https://github.com/NVIDIA/cutlass/blob/affd1b693dfc121c51118cbc8583dfd308227ca6/examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu#L168)  and the `Gemm` is declared in this way:

```c++
using CollectiveEpilogue = typename cutlass::epilogue::collective::CollectiveBuilder<
    ArchTag, OperatorClass,
    TileShape, ClusterShape,
    EpilogueTileType,
    ElementAccumulator, ElementCompute,
    ElementC, LayoutC, AlignmentC,
    ElementD, LayoutD, AlignmentD,
    EpilogueSchedule,
    FusionOperation
  >::CollectiveOp;

using CollectiveMainloopWithBlockWiseScaling = typename cutlass::gemm::collective::CollectiveBuilder<
    ArchTag, OperatorClass,
    ElementA, LayoutA, AlignmentA,
    ElementB, LayoutB, AlignmentB,
    ElementAccumulator,
    TileShape, ClusterShape,
    cutlass::gemm::collective::StageCountAutoCarveout<
      static_cast<int>(sizeof(typename CollectiveEpilogue::SharedStorage))
    >,
    KernelSchedule
  >::CollectiveOp;

using GemmKernel = cutlass::gemm::kernel::GemmUniversal<
    Shape<int,int,int,int>, // Indicates ProblemShape
    CollectiveMainloopWithBlockWiseScaling,
    CollectiveEpilogue
>;
using Gemm = cutlass::gemm::device::GemmUniversalAdapter<GemmKernel>;
```
Firstly, I tried just replacing `GemmUniversal` with `GemmSplitKParallel` to apply `SplitK`. It didn't work.

Now I'm trying to apply `StreamK/SplitK` to the GEMM with the other way based on this [example](https://github.com/NVIDIA/cutlass/blob/affd1b693dfc121c51118cbc8583dfd308227ca6/examples/74_blackwell_gemm_streamk/blackwell_gemm_streamk.cu#L172)
```c++
  using GemmKernel = cutlass::gemm::kernel::GemmUniversal<
      Shape<int,int,int,int>,
      CollectiveOp,
      EpilogueOp,
      cutlass::gemm::StreamKScheduler // <--- Change needed to enable the stream-K scheduler
  >;
using Gemm = cutlass::gemm::device::GemmUniversalAdapter<GemmKernel>;

```
But I'm not sure how to use it next. 
Does this method support FP8 GEMM on Hopper? Which version of CUDA should I use?

@jackkosaian Will be very grateful for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[QST] How to apply StreamK to hopper warp specialized GEMM #2075

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[QST] How to apply StreamK to hopper warp specialized GEMM #2075

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions