Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit c6730e3

Browse files
committed
promoteToRegistersAtDepth: limit the total number of elements promoted
The limit applies per thread and is cumulated for all subtrees where promotion is performed. By default, it is set to SIZE_MAX, which ensures backwards-compatible behavior for all sensible cases (if something had required more than SIZE_MAX registers, it would have been spilled to global memory and still would not have fit). This limit will be exposed as a mapping option in an upcoming commit.
1 parent 9155a07 commit c6730e3

File tree

2 files changed

+10
-4
lines changed

2 files changed

+10
-4
lines changed

tc/core/polyhedral/cuda/memory_promotion_heuristic.cc

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -748,9 +748,12 @@ size_t promoteToRegistersBelow(
748748
* Promote to registers below "depth" schedule dimensions. Split bands if
749749
* necessary to create promotion scopes. Do not promote if it would require
750750
* splitting the band mapped to threads as we assume only one band can be
751-
* mapped.
751+
* mapped. Use at most "maxElements" per thread in all promoted subtrees.
752752
*/
753-
void promoteToRegistersAtDepth(MappedScop& mscop, size_t depth) {
753+
void promoteToRegistersAtDepth(
754+
MappedScop& mscop,
755+
size_t depth,
756+
size_t maxElements) {
754757
using namespace detail;
755758

756759
auto root = mscop.scop().scheduleRoot();
@@ -784,7 +787,7 @@ void promoteToRegistersAtDepth(MappedScop& mscop, size_t depth) {
784787
auto scopes = functional::Map(findScope, bands);
785788

786789
for (auto scope : scopes) {
787-
promoteToRegistersBelow(mscop, scope);
790+
maxElements = promoteToRegistersBelow(mscop, scope, maxElements);
788791
}
789792
}
790793

tc/core/polyhedral/cuda/memory_promotion_heuristic.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ size_t promoteToRegistersBelow(
4646
detail::ScheduleTree* scope,
4747
std::size_t maxElements = SIZE_MAX);
4848

49-
void promoteToRegistersAtDepth(MappedScop& scop, std::size_t depth);
49+
void promoteToRegistersAtDepth(
50+
MappedScop& scop,
51+
std::size_t depth,
52+
std::size_t maxElements = SIZE_MAX);
5053

5154
} // namespace cuda
5255
} // namespace polyhedral

0 commit comments

Comments
 (0)