Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 900cb42

Browse files
committed
promoteToSharedAtDepth: ignore scopes below thread mapping
Band nodes at the given depth may be above or below the thread mapping. In the latter case, promoteToSharedBelow would throw an IncorrectScope exception because promotion to shared below a thread mapping node is forbidden and is generally nonsensical. In a branching tree, there is no guarantee that nodes at the given schedule depth are all above or below the thread mapping. Simply ignore the subtrees that have a thread-mapping ancestor node when performing shared memory promotion at depth. Leaving promoteToSharedBelow as is (throwing an exception) because it may end up being reused by different callers. Arguably, promoteToSharedAtDepth is a higher-level API and should not throw if only one subtree is problematic.
1 parent b0b73cb commit 900cb42

File tree

2 files changed

+18
-8
lines changed

2 files changed

+18
-8
lines changed

tc/core/polyhedral/cuda/memory_promotion_heuristic.cc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -565,7 +565,8 @@ inline bool isThreadMappedBand(const detail::ScheduleTree* tree) {
565565
/*
566566
* For every place in the schedule tree where schedule depth (i.e., the number
567567
* of preceding band members) is "depth", promote tensor reference groups to
568-
* shared memory. Split bands if necessary to insert promotions.
568+
* shared memory if there is no thread mapping above this place. Split bands
569+
* if necessary to insert promotions.
569570
*
570571
* Use at most "maxMemory" bytes. If a groups does not fit the remaining
571572
* memory, do not promote it and keep looking for a smaller group.
@@ -600,9 +601,16 @@ void promoteToSharedAtDepth(
600601
// immediately below it in the tree. In particular, promote if the
601602
// approximated footprint fits into the remaining memory, and the reference
602603
// group either features reuse or is accessed in a non-coalesced way, or
603-
// both.
604+
// both. Do not promote if the band node is located below the thread mapping
605+
// as promotion to shared is not allowed in this context.
604606
size_t remainingMemory = maxMemory;
605607
for (auto bandNode : bands) {
608+
if (isInThreadMappedScope(root, bandNode)) {
609+
LOG_IF(INFO, FLAGS_debug_tc_mapper)
610+
<< "not promoting subtree to shared because it is below "
611+
<< "a thread mapping node";
612+
continue;
613+
}
606614
promoteToSharedBelow(scop, bandNode, remainingMemory);
607615
}
608616

test/test_cuda_mapper_memory_promotion.cc

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -437,12 +437,14 @@ TEST_F(MapperMemoryPromotionRAW, fitAtOuterDepths) {
437437
<< "expected one reference group to be promoted";
438438
}
439439

440-
TEST_F(MapperMemoryPromotionRAW, throwIfCopiesBelowThreads) {
441-
EXPECT_THROW(
442-
makeWithSharedGreedy(42, 40, 64, 64, 3, 8192), promotion::IncorrectScope);
443-
444-
EXPECT_THROW(
445-
makeWithSharedGreedy(42, 40, 64, 64, 4, 8192), promotion::IncorrectScope);
440+
TEST_F(MapperMemoryPromotionRAW, noSharedPromotionBelowThreads) {
441+
auto mscop1 = makeWithSharedGreedy(42, 40, 64, 64, 3, 8192);
442+
EXPECT_EQ(mscop1->scop().promotedDecls().size(), 0u)
443+
<< "expected no promotion below threads";
444+
445+
auto mscop2 = makeWithSharedGreedy(42, 40, 64, 64, 4, 8192);
446+
EXPECT_EQ(mscop2->scop().promotedDecls().size(), 0u)
447+
<< "expected no promotion below threads";
446448
}
447449

448450
class MatMulBias : public TestMapper {

0 commit comments

Comments
 (0)