Skip to content

Commit c9a87a5

Browse files
authored
[SLPVectorizer] Use accurate cost for external users of resize shuffles (#137419)
When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost.
1 parent 9eb0020 commit c9a87a5

File tree

4 files changed

+99
-119
lines changed

4 files changed

+99
-119
lines changed

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14910,25 +14910,47 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals,
1491014910

1491114911
Cost += ExtractCost;
1491214912
auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask,
14913-
bool) {
14913+
bool ForSingleMask) {
1491414914
InstructionCost C = 0;
1491514915
unsigned VF = Mask.size();
1491614916
unsigned VecVF = TE->getVectorFactor();
14917-
if (VF != VecVF &&
14918-
(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) ||
14919-
!ShuffleVectorInst::isIdentityMask(Mask, VF))) {
14920-
SmallVector<int> OrigMask(VecVF, PoisonMaskElem);
14921-
std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
14922-
OrigMask.begin());
14923-
C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc,
14924-
getWidenedType(TE->getMainOp()->getType(), VecVF),
14925-
OrigMask);
14926-
LLVM_DEBUG(
14927-
dbgs() << "SLP: Adding cost " << C
14928-
<< " for final shuffle of insertelement external users.\n";
14929-
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
14930-
Cost += C;
14931-
return std::make_pair(TE, true);
14917+
bool HasLargeIndex =
14918+
any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); });
14919+
if ((VF != VecVF && HasLargeIndex) ||
14920+
!ShuffleVectorInst::isIdentityMask(Mask, VF)) {
14921+
14922+
if (HasLargeIndex) {
14923+
SmallVector<int> OrigMask(VecVF, PoisonMaskElem);
14924+
std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
14925+
OrigMask.begin());
14926+
C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc,
14927+
getWidenedType(TE->getMainOp()->getType(), VecVF),
14928+
OrigMask);
14929+
LLVM_DEBUG(
14930+
dbgs() << "SLP: Adding cost " << C
14931+
<< " for final shuffle of insertelement external users.\n";
14932+
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
14933+
Cost += C;
14934+
return std::make_pair(TE, true);
14935+
}
14936+
14937+
if (!ForSingleMask) {
14938+
SmallVector<int> ResizeMask(VF, PoisonMaskElem);
14939+
for (unsigned I = 0; I < VF; ++I) {
14940+
if (Mask[I] != PoisonMaskElem)
14941+
ResizeMask[Mask[I]] = Mask[I];
14942+
}
14943+
if (!ShuffleVectorInst::isIdentityMask(ResizeMask, VF))
14944+
C = ::getShuffleCost(
14945+
*TTI, TTI::SK_PermuteSingleSrc,
14946+
getWidenedType(TE->getMainOp()->getType(), VecVF), ResizeMask);
14947+
LLVM_DEBUG(
14948+
dbgs() << "SLP: Adding cost " << C
14949+
<< " for final shuffle of insertelement external users.\n";
14950+
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
14951+
14952+
Cost += C;
14953+
}
1493214954
}
1493314955
return std::make_pair(TE, false);
1493414956
};

0 commit comments

Comments
 (0)