Skip to content

Commit ab97c9b

Browse files
committed
[LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452
1 parent d716eab commit ab97c9b

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6483,9 +6483,10 @@ LoopVectorizationCostModel::expectedCost(ElementCount VF) {
64836483
// if-converted. This means that the block's instructions (aside from
64846484
// stores and instructions that may divide by zero) will now be
64856485
// unconditionally executed. For the scalar case, we may not always execute
6486-
// the predicated block. Thus, scale the block's cost by the probability of
6487-
// executing it.
6488-
if (VF.isScalar() && blockNeedsPredication(BB))
6486+
// the predicated block, if it is an if-else block. Thus, scale the block's
6487+
// cost by the probability of executing it. blockNeedsPredication from
6488+
// Legal is used so as to not include all blocks in tail folded loops.
6489+
if (VF.isScalar() && Legal->blockNeedsPredication(BB))
64896490
BlockCost.first /= getReciprocalPredBlockProb();
64906491

64916492
Cost.first += BlockCost.first;

llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ define void @pred_loop(i32* %off, i32* %data, i32* %dst, i32 %n) #0 {
1515
; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add1, i32* %arrayidx2, align 4
1616
; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i32 %add, %n
1717
; CHECK-COST-NEXT: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %exit.loopexit, label %for.body
18-
; CHECK-COST-NEXT: LV: Scalar loop costs: 2.
18+
; CHECK-COST-NEXT: LV: Scalar loop costs: 5.
1919

2020
entry:
2121
%cmp8 = icmp sgt i32 %n, 0

0 commit comments

Comments
 (0)