Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 147597a

Browse files
committed
register promotion: add promoteToRegistersAtDepth
This function attempts register promotion below the given number of schedule dimensions, splitting the bands to create a proper scope in the tree if necessary. It will be used in an upcoming commit. I did consider having promotion depth relative to thread mapping, but there is a situation I did not want to handle. Consider a schedule tree with a sequence node after a band with 2 members. The first child has two 2-member bands, with the innermost one mapped to threads. The second child has only one 2-member band, which mapped to threads. Relative depth "-4" would make me leave both children and perform promotion _both_ above and below the common ancestor band. Does not really make sense. Certainly, we can define a rule to handle such situations, like the deepest common depth, but we want to go in a different direction (band-local decisions in the tuner) anyway.
1 parent 30eae2a commit 147597a

File tree

2 files changed

+55
-0
lines changed

2 files changed

+55
-0
lines changed

tc/core/polyhedral/cuda/memory_promotion_heuristic.cc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -602,6 +602,15 @@ void promoteToSharedGreedy(
602602
scop.insertSyncsAroundCopies(bandNode);
603603
}
604604
}
605+
606+
/*
607+
* Check if "tree" is a band node mapped to threads. In particular, check that
608+
* "tree" is a band and a thread-specific node appears as its only child.
609+
*/
610+
inline bool isThreadMappedBand(const detail::ScheduleTree* tree) {
611+
return matchOne(band(threadSpecific(any())), tree) ||
612+
matchOne(band(threadSpecific()), tree);
613+
}
605614
} // namespace
606615

607616
void promoteGreedilyAtDepth(
@@ -720,6 +729,50 @@ void promoteToRegistersBelow(MappedScop& mscop, detail::ScheduleTree* scope) {
720729
}
721730
}
722731

732+
/*
733+
* Promote to registers below "depth" schedule dimensions. Split bands if
734+
* necessary to create promotion scopes. Do not promote if it would require
735+
* splitting the band mapped to threads as we assume only one band can be
736+
* mapped.
737+
*/
738+
void promoteToRegistersAtDepth(MappedScop& mscop, size_t depth) {
739+
using namespace detail;
740+
741+
auto root = mscop.scop().scheduleRoot();
742+
743+
// 1. Collect all bands with a member located at the given depth in the
744+
// overall schedule. Make sure this is the last member of the band by
745+
// splitting off the subsequent members into a different band. Ignore bands
746+
// mapped to threads if splitting is required as it would break the invariant
747+
// of a single band being mapped to threads in a subtree.
748+
// TODO: allow splitting the thread-mapped bands; for example, tile them
749+
// explicitly with block size, use the point loops for thread mapping
750+
// but ignore them in depth computation.
751+
auto bands = bandsContainingScheduleDepth(root, depth);
752+
bands = functional::Filter(
753+
[root, depth](ScheduleTree* tree) {
754+
auto band = tree->elemAs<ScheduleTreeElemBand>();
755+
return !isThreadMappedBand(tree) ||
756+
tree->scheduleDepth(root) + band->nMember() == depth;
757+
},
758+
bands);
759+
bands = bandsSplitAfterDepth(bands, root, depth);
760+
761+
// 2. We don't want copies inserted between thread-mapped bands and the
762+
// thread-specific marker, but rather below that marker. If any of the bands
763+
// are mapped to threads, take their first children as promotion scope
764+
// instead of the band itself.
765+
std::function<ScheduleTree*(ScheduleTree*)> findScope =
766+
[](ScheduleTree* tree) {
767+
return isThreadMappedBand(tree) ? tree->child({0}) : tree;
768+
};
769+
auto scopes = functional::Map(findScope, bands);
770+
771+
for (auto scope : scopes) {
772+
promoteToRegistersBelow(mscop, scope);
773+
}
774+
}
775+
723776
// Promote at the positions of the thread specific markers.
724777
void promoteToRegistersBelowThreads(MappedScop& mscop, size_t nRegisters) {
725778
auto& scop = mscop.scop();

tc/core/polyhedral/cuda/memory_promotion_heuristic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,7 @@ void promoteGreedilyAtDepth(
4545
void promoteToRegistersBelow(MappedScop& mscop, detail::ScheduleTree* scope);
4646

4747
void promoteToRegistersBelowThreads(MappedScop& scop, std::size_t nRegisters);
48+
void promoteToRegistersAtDepth(MappedScop& scop, std::size_t depth);
49+
4850
} // namespace polyhedral
4951
} // namespace tc

0 commit comments

Comments
 (0)