Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 847e091

Browse files
author
Sven Verdoolaege
committed
fixThreadsBelowFilter: always fix all remaining thread identifiers to zero
Since 497d3c4 (complete thread mapping in a single band, Thu Apr 12 10:00:09 2018 +0200), a branch is mapped to all thread identifiers as soon as it gets mapped to any thread identifier. There is therefore no longer any need to specify the range of thread identifiers to fix in fixThreadsBelowFilter as it is always equal to the total number of thread identifiers.
1 parent 12d2793 commit 847e091

File tree

1 file changed

+10
-16
lines changed

1 file changed

+10
-16
lines changed

tc/core/polyhedral/cuda/mapped_scop.cc

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -147,13 +147,13 @@ void MappedScop::mapToBlocksAndScaleBand(
147147
/*
148148
* Given a filter node in the schedule tree of a mapped scop,
149149
* insert another filter underneath (if needed) that fixes
150-
* the thread identifiers in the range [begin, end) to zero.
150+
* the remaining thread identifiers starting at "begin" to zero.
151151
*/
152152
void fixThreadsBelowFilter(
153153
MappedScop& mscop,
154154
detail::ScheduleTree* filterTree,
155-
size_t begin,
156-
size_t end) {
155+
size_t begin) {
156+
size_t end = mscop.numThreads.view.size();
157157
if (begin == end) {
158158
return;
159159
}
@@ -413,21 +413,16 @@ bool hasOuterSequentialMember(
413413
// If any separation is needed for mapping reductions to full blocks,
414414
// then do so first.
415415
//
416-
// If "st" has multiple children, then make sure they are mapped
417-
// to the same number of thread identifiers by fixing those
418-
// that are originally mapped to fewer identifiers to value zero
419-
// for the remaining thread identifiers.
416+
// If "st" has multiple children and if any of those children
417+
// is mapped to threads, then make sure the other children
418+
// are also mapped to threads, by fixing the thread identifiers to value zero.
420419
// If, moreover, "st" is a sequence node and at least one of its
421420
// children is mapped to threads, then introduce synchronization
422421
// before and after children that are mapped to threads.
423422
// Also add synchronization between the last child and
424423
// the next iteration of the first child if there may be such
425424
// a next iteration that is not already covered by synchronization
426425
// on an outer node.
427-
// If any synchronization is introduced, then the mapping
428-
// to threads needs to be completed to all thread ids
429-
// because the synchronization needs to be introduced outside
430-
// any mapping to threads.
431426
size_t MappedScop::mapInnermostBandsToThreads(detail::ScheduleTree* st) {
432427
if (needReductionSeparation(st)) {
433428
st = separateReduction(st);
@@ -441,11 +436,10 @@ size_t MappedScop::mapInnermostBandsToThreads(detail::ScheduleTree* st) {
441436
auto n = nChildren > 0 ? *std::max_element(nInner.begin(), nInner.end()) : 0;
442437
if (nChildren > 1) {
443438
auto needSync = st->elemAs<detail::ScheduleTreeElemSequence>() && n > 0;
444-
if (needSync) {
445-
n = numThreads.view.size();
446-
}
447-
for (size_t i = 0; i < nChildren; ++i) {
448-
fixThreadsBelowFilter(*this, children[i], nInner[i], n);
439+
if (n > 0) {
440+
for (size_t i = 0; i < nChildren; ++i) {
441+
fixThreadsBelowFilter(*this, children[i], nInner[i]);
442+
}
449443
}
450444
if (needSync) {
451445
auto outer = hasOuterSequentialMember(scop_->scheduleRoot(), st);

0 commit comments

Comments
 (0)