Skip to content

[QST] [CuTeDSL] Redudant wait in Hopper example #2508

@simveit

Description

@simveit

In the Hopper example we perform (in L859)

cute.nvgpu.warpgroup.wait_group(k_pipe_mmas)

As I understand it this is not necessary because we wait for all the commited wgmma instructions before we perform the epilog (in L918):

cute.nvgpu.warpgroup.wait_group(0)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions