-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[AMDGPU] Examine instructions in pending queues during scheduling #147653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Examine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints. The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach. This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resouce management.
@llvm/pr-subscribers-backend-amdgpu Author: Austin Kerbow (kerbowa) ChangesExamine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints. The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach. This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resource management. Patch is 579.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147653.diff 17 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index fce8f36d45969..35886eb04c711 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -68,6 +68,14 @@ static cl::opt<bool> GCNTrackers(
cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
cl::init(false));
+static cl::opt<bool> ExaminePendingQueue(
+ "amdgpu-examine-pending-queue", cl::Hidden,
+ cl::desc(
+ "Examine instructions in the pending the pending queue when "
+ "scheduling. This makes instructions visible to heuristics that cannot "
+ "immediately be issued due to hardware resource constraints."),
+ cl::init(true));
+
const unsigned ScheduleMetrics::ScaleFactor = 100;
GCNSchedStrategy::GCNSchedStrategy(const MachineSchedContext *C)
@@ -319,17 +327,45 @@ void GCNSchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,
}
}
+static bool shouldCheckPending(SchedBoundary &Zone,
+ const TargetSchedModel *SchedModel) {
+ const unsigned ReadyListLimit = 256;
+ bool HasBufferedModel =
+ SchedModel->hasInstrSchedModel() && SchedModel->getMicroOpBufferSize();
+ return ExaminePendingQueue &&
+ Zone.Available.size() + Zone.Pending.size() <= ReadyListLimit &&
+ HasBufferedModel;
+}
+
+static SUnit *pickOnlyChoice(SchedBoundary &Zone,
+ const TargetSchedModel *SchedModel) {
+ if (!shouldCheckPending(Zone, SchedModel) || Zone.Pending.empty())
+ return Zone.pickOnlyChoice();
+ return nullptr;
+}
+
+#ifndef NDEBUG
+void GCNSchedStrategy::printCandidateDecision(const SchedCandidate &Current,
+ const SchedCandidate &Preferred) {
+ LLVM_DEBUG(dbgs() << "Prefer:\t\t"; DAG->dumpNode(*Preferred.SU));
+ if (Current.SU)
+ LLVM_DEBUG(dbgs() << "Not:\t"; DAG->dumpNode(*Current.SU));
+ LLVM_DEBUG(dbgs() << "Reason:\t\t"; traceCandidate(Preferred));
+}
+#endif
+
// This function is mostly cut and pasted from
// GenericScheduler::pickNodeFromQueue()
void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
const CandPolicy &ZonePolicy,
const RegPressureTracker &RPTracker,
- SchedCandidate &Cand,
+ SchedCandidate &Cand, bool &IsPending,
bool IsBottomUp) {
const SIRegisterInfo *SRI = static_cast<const SIRegisterInfo *>(TRI);
ArrayRef<unsigned> Pressure = RPTracker.getRegSetPressureAtPos();
unsigned SGPRPressure = 0;
unsigned VGPRPressure = 0;
+ IsPending = false;
if (DAG->isTrackingPressure()) {
if (!GCNTrackers) {
SGPRPressure = Pressure[AMDGPU::RegisterPressureSets::SReg_32];
@@ -342,8 +378,9 @@ void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
VGPRPressure = T->getPressure().getArchVGPRNum();
}
}
- ReadyQueue &Q = Zone.Available;
- for (SUnit *SU : Q) {
+ LLVM_DEBUG(dbgs() << "Available Q:\n");
+ ReadyQueue &AQ = Zone.Available;
+ for (SUnit *SU : AQ) {
SchedCandidate TryCand(ZonePolicy);
initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI, SGPRPressure,
@@ -355,27 +392,59 @@ void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
// Initialize resource delta if needed in case future heuristics query it.
if (TryCand.ResDelta == SchedResourceDelta())
TryCand.initResourceDelta(Zone.DAG, SchedModel);
+ LLVM_DEBUG(printCandidateDecision(Cand, TryCand));
Cand.setBest(TryCand);
- LLVM_DEBUG(traceCandidate(Cand));
}
+#ifndef NDEBUG
+ else
+ printCandidateDecision(TryCand, Cand);
+#endif
+ }
+
+ if (!shouldCheckPending(Zone, SchedModel))
+ return;
+
+ LLVM_DEBUG(dbgs() << "Pending Q:\n");
+ ReadyQueue &PQ = Zone.Pending;
+ for (SUnit *SU : PQ) {
+
+ SchedCandidate TryCand(ZonePolicy);
+ initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI, SGPRPressure,
+ VGPRPressure, IsBottomUp);
+ // Pass SchedBoundary only when comparing nodes from the same boundary.
+ SchedBoundary *ZoneArg = Cand.AtTop == TryCand.AtTop ? &Zone : nullptr;
+ tryPendingCandidate(Cand, TryCand, ZoneArg);
+ if (TryCand.Reason != NoCand) {
+ // Initialize resource delta if needed in case future heuristics query it.
+ if (TryCand.ResDelta == SchedResourceDelta())
+ TryCand.initResourceDelta(Zone.DAG, SchedModel);
+ LLVM_DEBUG(printCandidateDecision(Cand, TryCand));
+ IsPending = true;
+ Cand.setBest(TryCand);
+ }
+#ifndef NDEBUG
+ else
+ printCandidateDecision(TryCand, Cand);
+#endif
}
}
// This function is mostly cut and pasted from
// GenericScheduler::pickNodeBidirectional()
-SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
+SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode,
+ bool &PickedPending) {
// Schedule as far as possible in the direction of no choice. This is most
// efficient, but also provides the best heuristics for CriticalPSets.
- if (SUnit *SU = Bot.pickOnlyChoice()) {
+ if (SUnit *SU = pickOnlyChoice(Bot, SchedModel)) {
IsTopNode = false;
return SU;
}
- if (SUnit *SU = Top.pickOnlyChoice()) {
+ if (SUnit *SU = pickOnlyChoice(Top, SchedModel)) {
IsTopNode = true;
return SU;
}
- // Set the bottom-up policy based on the state of the current bottom zone and
- // the instructions outside the zone, including the top zone.
+ // Set the bottom-up policy based on the state of the current bottom zone
+ // and the instructions outside the zone, including the top zone.
CandPolicy BotPolicy;
setPolicy(BotPolicy, /*IsPostRA=*/false, Bot, &Top);
// Set the top-down policy based on the state of the current top zone and
@@ -383,12 +452,14 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
CandPolicy TopPolicy;
setPolicy(TopPolicy, /*IsPostRA=*/false, Top, &Bot);
+ bool BotPending = false;
// See if BotCand is still valid (because we previously scheduled from Top).
LLVM_DEBUG(dbgs() << "Picking from Bot:\n");
if (!BotCand.isValid() || BotCand.SU->isScheduled ||
BotCand.Policy != BotPolicy) {
BotCand.reset(CandPolicy());
pickNodeFromQueue(Bot, BotPolicy, DAG->getBotRPTracker(), BotCand,
+ BotPending,
/*IsBottomUp=*/true);
assert(BotCand.Reason != NoCand && "failed to find the first candidate");
} else {
@@ -398,6 +469,7 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
SchedCandidate TCand;
TCand.reset(CandPolicy());
pickNodeFromQueue(Bot, BotPolicy, DAG->getBotRPTracker(), TCand,
+ BotPending,
/*IsBottomUp=*/true);
assert(TCand.SU == BotCand.SU &&
"Last pick result should correspond to re-picking right now");
@@ -405,12 +477,14 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
#endif
}
+ bool TopPending = false;
// Check if the top Q has a better candidate.
LLVM_DEBUG(dbgs() << "Picking from Top:\n");
if (!TopCand.isValid() || TopCand.SU->isScheduled ||
TopCand.Policy != TopPolicy) {
TopCand.reset(CandPolicy());
pickNodeFromQueue(Top, TopPolicy, DAG->getTopRPTracker(), TopCand,
+ TopPending,
/*IsBottomUp=*/false);
assert(TopCand.Reason != NoCand && "failed to find the first candidate");
} else {
@@ -420,6 +494,7 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
SchedCandidate TCand;
TCand.reset(CandPolicy());
pickNodeFromQueue(Top, TopPolicy, DAG->getTopRPTracker(), TCand,
+ TopPending,
/*IsBottomUp=*/false);
assert(TCand.SU == TopCand.SU &&
"Last pick result should correspond to re-picking right now");
@@ -430,12 +505,21 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
// Pick best from BotCand and TopCand.
LLVM_DEBUG(dbgs() << "Top Cand: "; traceCandidate(TopCand);
dbgs() << "Bot Cand: "; traceCandidate(BotCand););
- SchedCandidate Cand = BotCand;
- TopCand.Reason = NoCand;
- tryCandidate(Cand, TopCand, nullptr);
- if (TopCand.Reason != NoCand) {
- Cand.setBest(TopCand);
+ SchedCandidate Cand = BotPending ? TopCand : BotCand;
+ SchedCandidate TryCand = BotPending ? BotCand : TopCand;
+ PickedPending = BotPending && TopPending;
+
+ TryCand.Reason = NoCand;
+ if (BotPending || TopPending) {
+ PickedPending |= tryPendingCandidate(Cand, TopCand, nullptr);
+ } else {
+ tryCandidate(Cand, TryCand, nullptr);
}
+
+ if (TryCand.Reason != NoCand) {
+ Cand.setBest(TryCand);
+ }
+
LLVM_DEBUG(dbgs() << "Picking: "; traceCandidate(Cand););
IsTopNode = Cand.AtTop;
@@ -450,35 +534,46 @@ SUnit *GCNSchedStrategy::pickNode(bool &IsTopNode) {
Bot.Available.empty() && Bot.Pending.empty() && "ReadyQ garbage");
return nullptr;
}
+ bool PickedPending;
SUnit *SU;
do {
+ PickedPending = false;
if (RegionPolicy.OnlyTopDown) {
- SU = Top.pickOnlyChoice();
+ SU = pickOnlyChoice(Top, SchedModel);
if (!SU) {
CandPolicy NoPolicy;
TopCand.reset(NoPolicy);
pickNodeFromQueue(Top, NoPolicy, DAG->getTopRPTracker(), TopCand,
+ PickedPending,
/*IsBottomUp=*/false);
assert(TopCand.Reason != NoCand && "failed to find a candidate");
SU = TopCand.SU;
}
IsTopNode = true;
} else if (RegionPolicy.OnlyBottomUp) {
- SU = Bot.pickOnlyChoice();
+ SU = pickOnlyChoice(Bot, SchedModel);
if (!SU) {
CandPolicy NoPolicy;
BotCand.reset(NoPolicy);
pickNodeFromQueue(Bot, NoPolicy, DAG->getBotRPTracker(), BotCand,
+ PickedPending,
/*IsBottomUp=*/true);
assert(BotCand.Reason != NoCand && "failed to find a candidate");
SU = BotCand.SU;
}
IsTopNode = false;
} else {
- SU = pickNodeBidirectional(IsTopNode);
+ SU = pickNodeBidirectional(IsTopNode, PickedPending);
}
} while (SU->isScheduled);
+ if (PickedPending) {
+ unsigned ReadyCycle = IsTopNode ? SU->TopReadyCycle : SU->BotReadyCycle;
+ SchedBoundary &Zone = IsTopNode ? Top : Bot;
+ Zone.bumpCycle(ReadyCycle);
+ Zone.releasePending();
+ }
+
if (SU->isTopReady())
Top.removeReady(SU);
if (SU->isBottomReady())
@@ -524,6 +619,47 @@ GCNSchedStageID GCNSchedStrategy::getNextStage() const {
return *std::next(CurrentStage);
}
+bool GCNSchedStrategy::tryPendingCandidate(SchedCandidate &Cand,
+ SchedCandidate &TryCand,
+ SchedBoundary *Zone) const {
+ // Initialize the candidate if needed.
+ if (!Cand.isValid()) {
+ TryCand.Reason = NodeOrder;
+ return true;
+ }
+
+ // Bias PhysReg Defs and copies to their uses and defined respectively.
+ if (tryGreater(biasPhysReg(TryCand.SU, TryCand.AtTop),
+ biasPhysReg(Cand.SU, Cand.AtTop), TryCand, Cand, PhysReg))
+ return TryCand.Reason != NoCand;
+
+ // Avoid exceeding the target's limit.
+ if (DAG->isTrackingPressure() &&
+ tryPressure(TryCand.RPDelta.Excess, Cand.RPDelta.Excess, TryCand, Cand,
+ RegExcess, TRI, DAG->MF))
+ return TryCand.Reason != NoCand;
+
+ // Avoid increasing the max critical pressure in the scheduled region.
+ if (DAG->isTrackingPressure() &&
+ tryPressure(TryCand.RPDelta.CriticalMax, Cand.RPDelta.CriticalMax,
+ TryCand, Cand, RegCritical, TRI, DAG->MF))
+ return TryCand.Reason != NoCand;
+
+ bool SameBoundary = Zone != nullptr;
+ if (SameBoundary) {
+ TryCand.initResourceDelta(DAG, SchedModel);
+ if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
+ TryCand, Cand, ResourceReduce))
+ return TryCand.Reason != NoCand;
+ if (tryGreater(TryCand.ResDelta.DemandedResources,
+ Cand.ResDelta.DemandedResources, TryCand, Cand,
+ ResourceDemand))
+ return TryCand.Reason != NoCand;
+ }
+
+ return false;
+}
+
GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(
const MachineSchedContext *C, bool IsLegacyScheduler)
: GCNSchedStrategy(C) {
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 94cd795bbc8f6..c78835c8d5a77 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -44,17 +44,34 @@ raw_ostream &operator<<(raw_ostream &OS, const GCNSchedStageID &StageID);
/// heuristics to determine excess/critical pressure sets.
class GCNSchedStrategy : public GenericScheduler {
protected:
- SUnit *pickNodeBidirectional(bool &IsTopNode);
+ SUnit *pickNodeBidirectional(bool &IsTopNode, bool &PickedPending);
void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,
const RegPressureTracker &RPTracker,
- SchedCandidate &Cand, bool IsBottomUp);
+ SchedCandidate &Cand, bool &IsPending,
+ bool IsBottomUp);
void initCandidate(SchedCandidate &Cand, SUnit *SU, bool AtTop,
const RegPressureTracker &RPTracker,
const SIRegisterInfo *SRI, unsigned SGPRPressure,
unsigned VGPRPressure, bool IsBottomUp);
+ /// Evaluates instructions in the pending queue using a subset of scheduling
+ /// heuristics.
+ ///
+ /// Instructions that cannot be issued due to hardware constraints are placed
+ /// in the pending queue rather than the available queue, making them normally
+ /// invisible to scheduling heuristics. However, in certain scenarios (such as
+ /// avoiding register spilling), it may be beneficial to consider scheduling
+ /// these not-yet-ready instructions.
+ bool tryPendingCandidate(SchedCandidate &Cand, SchedCandidate &TryCand,
+ SchedBoundary *Zone) const;
+
+#ifndef NDEBUG
+ void printCandidateDecision(const SchedCandidate &Current,
+ const SchedCandidate &Preferred);
+#endif
+
std::vector<unsigned> Pressure;
std::vector<unsigned> MaxPressure;
diff --git a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
index 668219875db72..86505107587f1 100644
--- a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
+++ b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
@@ -947,6 +947,7 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1020
+; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2044
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2040
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2036
@@ -1201,7 +1202,6 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1040
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1036
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1032
-; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1024
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1016
; GFX9-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1012
@@ -1466,6 +1466,7 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1020
+; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2044
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2040
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:2036
@@ -1720,7 +1721,6 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1040
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1036
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1032
-; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1024
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1016
; GFX10-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:1012
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
index aad6e031aa9ed..ac91dadc07995 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
@@ -6,1145 +6,1149 @@
define amdgpu_kernel void @largeInterleave() #0 { ret void }
; GCN-LABEL: largeInterleave:
; GCN: ; %bb.0:
+ ; GCN-NEXT: ; implicit-def: $sgpr17
+ ; GCN-NEXT: ; implicit-def: $vgpr64
+ ; GCN-NEXT: ; implicit-def: $vgpr66
; GCN-NEXT: ; implicit-def: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
- ; GCN-NEXT: ; implicit-def: $vgpr0
- ; GCN-NEXT: ; implicit-def: $vgpr2
- ; GCN-NEXT: ; implicit-def: $vgpr1
- ; GCN-NEXT: ; implicit-def: $vgpr8
+ ; GCN-NEXT: ; implicit-def: $vgpr65
+ ; GCN-NEXT: ; implicit-def: $vgpr72
+ ; GCN-NEXT: ; implicit-def: $vgpr238
+ ; GCN-NEXT: ; implicit-def: $vgpr152_vgpr153_vgpr154_vgpr155
+ ; GCN-NEXT: ; implicit-def: $vgpr80
+ ; GCN-NEXT: ; implicit-def: $vgpr81
+ ; GCN-NEXT: ; implicit-def: $vgpr82
+ ; GCN-NEXT: ; implicit-def: $vgpr83
+ ; GCN-NEXT: ; implicit-def: $vgpr84
+ ; GCN-NEXT: ; implicit-def: $vgpr85
+ ; GCN-NEXT: ; implicit-def: $vgpr86
+ ; GCN-NEXT: ; implicit-def: $vgpr87
+ ; GCN-NEXT: ; implicit-def: $vgpr88
+ ; GCN-NEXT: ; implicit-def: $vgpr89
+ ; GCN-NEXT: ; implicit-def: $vgpr90
+ ; GCN-NEXT: ; implicit-def: $vgpr91
+ ; GCN-NEXT: ; implicit-def: $vgpr92
+ ; GCN-NEXT: ; implicit-def: $vgpr93
; GCN-NEXT: ; implicit-def: $vgpr94
- ; GCN-NEXT: ; implicit-def: $vgpr76_vgpr77_vgpr78_vgpr79
- ; GCN-NEXT: ; implicit-def: $vgpr106
- ; GCN-NEXT: ; implicit-def: $vgpr132
- ; GCN-NEXT: ; implicit-def: $vgpr133
- ; GCN-NEXT: ; implicit-def: $vgpr139
- ; GCN-NEXT: ; implicit-def: $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127
- ; GCN-NEXT: ; iglp_opt mask(0x00000002)
- ; GCN-NEXT: ; implicit-def: $sgpr0
+ ; GCN-NEXT: ; implicit-def: $vgpr73
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
- ; GCN-NEXT: v_readfirstlane_b32 s7, v0
+ ; GCN-NEXT: v_add_u32_e32 v232, v73, v80
+ ; GCN-NEXT: v_readfirstlane_b32 s17, v64
+ ; GCN-NEXT: ; implicit-def: $sgpr15
; GCN-NEXT: ; implicit-def: $sgpr8_sgpr9_sgpr10_sgpr11
- ; GCN-NEXT: ; kill: killed $sgpr8_sgpr9_sgpr10_sgpr11
- ; GCN-NEXT: ; implicit-def: $sgpr5
- ; GCN-NEXT: s_nop 1
- ; GCN-NEXT: v_lshl_add_u32 v0, s7, 4, v2
- ; GCN-NEXT: v_mul_lo_u32 v0, v0, s6
- ; GCN-NEXT: v_add_lshl_u32 v92, v0, v1, 1
- ; GCN-NEXT: v_add_u32_e32 v93, s0, v92
- ; GCN-NEXT: buffer_load_dwordx4 v[0:3], v92, s[8:11], 0 offen sc0 sc1
+ ; GCN-NEXT: v_add_u32_e32 v234, v73, v81
+ ; GCN-NEXT: v_add_u32_e32 v235, v73, v82
+ ; GCN-NEXT: v_lshl_add_u32 v64, s17, 4, v66
+ ; GCN-NEXT: v_mul_lo_u32 v64, v64, s6
+ ; GCN-NEXT: v_add_lshl_u32 v222, v64, v65, 1
+ ; GCN-NEXT: v_add_u32_e32 v95, s15, v222
+ ; GCN-NEXT: buffer_load_dwordx4 v[64:67], v222, s[8:11], 0 offen sc0 sc1
; GCN...
[truncated]
|
LLVM_DEBUG(dbgs() << "Prefer:\t\t"; DAG->dumpNode(*Preferred.SU)); | ||
if (Current.SU) | ||
LLVM_DEBUG(dbgs() << "Not:\t"; DAG->dumpNode(*Current.SU)); | ||
LLVM_DEBUG(dbgs() << "Reason:\t\t"; traceCandidate(Preferred)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use one debug LLVM_DEBUG({})
#ifndef NDEBUG | ||
else | ||
printCandidateDecision(TryCand, Cand); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a macro conditional else
return nullptr; | ||
} | ||
|
||
#ifndef NDEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave it in? The body will be empty in release build anyway
@@ -319,17 +327,45 @@ void GCNSchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU, | |||
} | |||
} | |||
|
|||
static bool shouldCheckPending(SchedBoundary &Zone, | |||
const TargetSchedModel *SchedModel) { | |||
const unsigned ReadyListLimit = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace the bool flag with a value for this limit? Disable will be implied by 0
Examine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints.
The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach.
This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resource management.