Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit a2aecd3

Browse files
committed
promotionImprovesCoalescing: use partial schedule instead of full
The check whether the promotion to shared memory improves coalescing is performed by looking at the schedule dimension that is mapped to CUDA thread x. The existing implementation relies on a so called "full schedule" that contains all schedule dimensions. In practice, the partial schedule until the dimension mapped to thread x is sufficient. Compute thie partial schedule inside of promotionImprovesCoalescing instead of precopmuting the "full schedule" externally.
1 parent 5f384ce commit a2aecd3

File tree

1 file changed

+6
-14
lines changed

1 file changed

+6
-14
lines changed

tc/core/polyhedral/cuda/memory_promotion_heuristic.cc

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -264,8 +264,7 @@ const detail::ScheduleTree* findThreadMappingAncestor(
264264
bool promotionImprovesCoalescing(
265265
const detail::ScheduleTree* root,
266266
const detail::ScheduleTree* node,
267-
const TensorReferenceGroup& group,
268-
isl::union_map schedule) {
267+
const TensorReferenceGroup& group) {
269268
auto originalAccesses = group.originalAccesses();
270269

271270
auto tensorDim = group.approximation.dim();
@@ -279,6 +278,7 @@ bool promotionImprovesCoalescing(
279278
auto depth = marker->scheduleDepth(root);
280279
auto activePoints = activeDomainPoints(root, mapping);
281280
auto localAccesses = originalAccesses.intersect_domain(activePoints);
281+
auto schedule = prefixSchedule(root, marker);
282282
auto scheduledAccesses = localAccesses.apply_domain(schedule);
283283
for (auto access : isl::UnionAsVector<isl::union_map>(scheduledAccesses)) {
284284
auto scheduleSpace = access.get_space().domain();
@@ -486,14 +486,11 @@ std::vector<detail::ScheduleTree*> bandsSplitAfterDepth(
486486
/*
487487
* Promote to shared memory in "scop" below the node "bandNode". Use at most
488488
* "remainingMemory" bytes, and update the variable to reflect the amount of
489-
* available shared memory remaining after promotion. "fullSched" is the union
490-
* of schedules at leaves of the schedule tree, expected to be computed by
491-
* "fullSchedule".
489+
* available shared memory remaining after promotion.
492490
*/
493491
void promoteToSharedBelow(
494492
Scop& scop,
495493
detail::ScheduleTree* bandNode,
496-
isl::union_map fullSched,
497494
size_t& remainingMemory) {
498495
auto root = scop.scheduleRoot();
499496
auto partialSched = partialSchedule(root, bandNode);
@@ -560,7 +557,7 @@ void promoteToSharedBelow(
560557
// Do not promote if the group features no reuse and is accessed in a
561558
// coalesced way.
562559
if (!hasReuseWithin(*group, partialSchedMupa) &&
563-
!promotionImprovesCoalescing(root, bandNode, *group, fullSched)) {
560+
!promotionImprovesCoalescing(root, bandNode, *group)) {
564561
continue;
565562
}
566563

@@ -607,19 +604,14 @@ void promoteToSharedGreedy(
607604
auto bands = bandsContainingScheduleDepth(root, depth);
608605
bands = bandsSplitAfterDepth(bands, root, depth);
609606

610-
// 2. Compute full schedule without mapping filters. The filters would make
611-
// it impossible to test for coalescing by incrementing a member of a band as
612-
// only the values divisible by grid or block size pass through the filter.
613-
auto fullSched = fullSchedule(root);
614-
615-
// 3. For each band that ends at "depth", take decisions about promotion
607+
// 2. For each band that ends at "depth", take decisions about promotion
616608
// immediately below it in the tree. In particular, promote if the
617609
// approximated footprint fits into the remaining memory, and the reference
618610
// group either features reuse or is accessed in a non-coalesced way, or
619611
// both.
620612
size_t remainingMemory = maxMemory;
621613
for (auto bandNode : bands) {
622-
promoteToSharedBelow(scop, bandNode, fullSched, remainingMemory);
614+
promoteToSharedBelow(scop, bandNode, remainingMemory);
623615
}
624616
}
625617

0 commit comments

Comments
 (0)