You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[libomptarget][devicertl] Remove branches around setting parallelLevel
Simplifies control flow to allow store/load forwarding
This change folds two basic blocks into one, leaving a single store to parallelLevel.
This is a step towards spmd kernels with sufficiently aggressive inlining folding
the loads from parallelLevel and thus discarding the nested parallel handling
when it is unused.
Transform:
```
int threadId = GetThreadIdInBlock();
if (threadId == 0) {
parallelLevel[0] = expr;
} else if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// =>
if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// because
unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);}
// so whenever threadId == 0, GetLaneId() is also 0.
```
That replaces a store in two distinct basic blocks with as single store.
A more aggressive follow up is possible if the threads in the warp/wave
race to write the same value to the same address. This is not done as
part of this change.
```
if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// =>
parallelLevel[GetWarpId()] = expr;
// because
unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
// so GetWarpId will index the same element for every thread in the warp
// and, because expr is lane-invariant in this case, every lane stores the
// same value to this unique address
```
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D105699
0 commit comments