Skip to content

Why not use sync after loading from TMEM to RMEM in example 02_mma_tma_sm100.cu #2525

@gujiewen

Description

@gujiewen

Accroding to ptx, tcgen05.ld is an async instruction, so why not use tcgen05.wait or its wrapper after loading in 02_mma_tma_sm100.cu.

// Load TMEM -> RMEM
copy(tiled_t2r_copy, tDtAcc, tDrAcc);
// AXPBY RMEM -> RMEM: tDrC = alpha * tDrAcc + beta * tDrC
axpby(alpha, tDrAcc, beta, tDrC);
// Store RMEM -> GMEM
copy(tDrC, tDgD);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions