Skip to content

[AMDGPU] efficiently wait for direct loads to LDS at all scopes #147258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: users/ssahasra/waitcnt-always-emit
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1385,6 +1385,19 @@ bool WaitcntGeneratorPreGFX12::applyPreexistingWaitcnt(
ScoreBrackets.simplifyWaitcnt(OldWait, OptNone);
Wait = Wait.combined(OldWait);

if (!WaitcntInstr && II.getOpcode() == AMDGPU::S_WAITCNT_soft) {
// Each direct load to LDS is also a store to LDS, but we do not have a
// separate counter for it. Instead these operations increment LOAD_CNT
// and need to be waited for at a release fence. So we treat a release
// fence as if it depends on any previous LDS DMA stores.
//
// Note that a user-specified S_WAITCNT instruction is not affected; we
// only check for S_WAITCNT_soft since that represents a fence.
//
// FIXME: How does one detect that a soft wait is a release???
ScoreBrackets.determineWait(LOAD_CNT, FIRST_LDS_VGPR, Wait);
}

// Merge consecutive waitcnt of the same type by erasing multiples.
if (WaitcntInstr ||
(!Wait.hasWaitExceptStoreCnt() && OpcodeIsSoft && !OptNone)) {
Expand Down
Loading