Skip to content

Commit c5672e2

Browse files
authored
[AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (#108791)
fadd reduction with 1. Fast flag set 2. No of elements in input vector is power of 2 results in series of faddp instructions. faddp instruction has latency/throughput identical to fadd instruction and hence, we set relative cost=1 for faddp as well. The change didn't show any regression with SPEC17-FP(C/C++), llvm-test-suite on Neoverse-V2.
1 parent 70529b2 commit c5672e2

File tree

3 files changed

+107
-130
lines changed

3 files changed

+107
-130
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4159,6 +4159,26 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
41594159
switch (ISD) {
41604160
default:
41614161
break;
4162+
case ISD::FADD:
4163+
if (Type *EltTy = ValTy->getScalarType();
4164+
// FIXME: For half types without fullfp16 support, this could extend and
4165+
// use a fp32 faddp reduction but current codegen unrolls.
4166+
MTy.isVector() && (EltTy->isFloatTy() || EltTy->isDoubleTy() ||
4167+
(EltTy->isHalfTy() && ST->hasFullFP16()))) {
4168+
const unsigned NElts = MTy.getVectorNumElements();
4169+
if (ValTy->getElementCount().getFixedValue() >= 2 && NElts >= 2 &&
4170+
isPowerOf2_32(NElts))
4171+
// Reduction corresponding to series of fadd instructions is lowered to
4172+
// series of faddp instructions. faddp has latency/throughput that
4173+
// matches fadd instruction and hence, every faddp instruction can be
4174+
// considered to have a relative cost = 1 with
4175+
// CostKind = TCK_RecipThroughput.
4176+
// An faddp will pairwise add vector elements, so the size of input
4177+
// vector reduces by half every time, requiring
4178+
// #(faddp instructions) = log2_32(NElts).
4179+
return (LT.first - 1) + /*No of faddp instructions*/ Log2_32(NElts);
4180+
}
4181+
break;
41624182
case ISD::ADD:
41634183
if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))
41644184
return (LT.first - 1) + Entry->Cost;

0 commit comments

Comments
 (0)