Skip to content

Commit 2eaef53

Browse files
committed
[TTI] BasicTTIImplBase::getInterleavedMemoryOpCost(): fix load discounting
The math here is: Cost of 1 load = cost of n loads / n Cost of live loads = num live loads * Cost of 1 load Cost of live loads = num live loads * (cost of n loads / n) Cost of live loads = cost of n loads * (num live loads / n) But, all the variables here are integers, and integer division rounds down, but this calculation clearly expects float semantics. Instead multiply upfront, and then perform round-up-division. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112302
1 parent 8ae83a1 commit 2eaef53

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1214,7 +1214,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
12141214
//
12151215
// TODO: Note that legalization can turn masked loads/stores into unmasked
12161216
// (legalized) loads/stores. This can be reflected in the cost.
1217-
if (VecTySize > VecTyLTSize) {
1217+
if (Cost.isValid() && VecTySize > VecTyLTSize) {
12181218
// The number of loads of a legal type it will take to represent a load
12191219
// of the unlegalized vector type.
12201220
unsigned NumLegalInsts = divideCeil(VecTySize, VecTyLTSize);
@@ -1231,7 +1231,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
12311231

12321232
// Scale the cost of the load by the fraction of legal instructions that
12331233
// will be used.
1234-
Cost *= UsedInsts.count() / NumLegalInsts;
1234+
Cost = divideCeil(UsedInsts.count() * Cost.getValue().getValue(),
1235+
NumLegalInsts);
12351236
}
12361237

12371238
// Then plus the cost of interleave operation.

llvm/test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ entry:
168168
; gaps.
169169
;
170170
; VF_2-LABEL: Checking a loop in "i64_factor_8"
171-
; VF_2: Found an estimated cost of 6 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
171+
; VF_2: Found an estimated cost of 10 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
172172
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
173173
; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp0, align 8
174174
; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp1, align 8

0 commit comments

Comments
 (0)