Skip to content

Commit a704e65

Browse files
authored
[flang] Added alternative inlining code for hlfir.cshift. (#129176)
Flang generates slower code for `CSHIFT(CSHIFT(PTR(:,:,I),sh1,1),sh2,2)` pattern in facerec than other compilers. The first CSHIFT can be done as two memcpy's wrapped in a loop for the second dimension. This does require creating a temporary array, but it seems to be faster, than the current hlfir.elemental inlining. I started with modifying the new index computation in hlfir.elemental inlining: the new arith.select approach does enable some vectorization in LLVM, but on x86 it is using gathers/scatters and does not give much speed-up. I also experimented with LoopBoundSplitPass and InductiveRangeCheckElimination for a simple (not chained) CSHIFT case, but I could not adjust them to split the loop with a condition on the value of the IV into two loops with disjoint iteration spaces. I thought if I could do it, I would be able to keep the hlfir.elemental inlining mostly untouched, and then adjust the hlfir.elemental inlining heuristics for the facerec case. Since I was not able to make these pass work for me, I added a special case inlining for CSHIFT(ARRAY,SH,DIM=1) via hlfir.eval_in_mem. If ARRAY is not statically known to have the contiguous leading dimension, there is a dynamic check for contiguity, which allows exposing it to LLVM and enabling the rewrite of the copy loops into memcpys. This approach is stepping on the toes of LoopVersioning, but it is helpful in facerec case. I measured ~6% speed-up on grace, and ~4% on zen4.
1 parent d2c4d1e commit a704e65

File tree

4 files changed

+657
-226
lines changed

4 files changed

+657
-226
lines changed

flang/include/flang/Optimizer/Builder/HLFIRTools.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -517,6 +517,19 @@ Entity loadElementAt(mlir::Location loc, fir::FirOpBuilder &builder,
517517
llvm::SmallVector<mlir::Value, Fortran::common::maxRank>
518518
genExtentsVector(mlir::Location loc, fir::FirOpBuilder &builder, Entity entity);
519519

520+
/// Generate an hlfir.designate that produces an 1D section
521+
/// of \p array using \p oneBasedIndices and \p dim:
522+
/// i = oneBasedIndices
523+
/// result => array(i(1), ..., i(dim-1), :, i(dim+1), ..., i(n))
524+
///
525+
/// The caller provides the pre-computed \p lbounds, \p extents
526+
/// and \p typeParams of the array.
527+
Entity gen1DSection(mlir::Location loc, fir::FirOpBuilder &builder,
528+
Entity array, int64_t dim,
529+
mlir::ArrayRef<mlir::Value> lbounds,
530+
mlir::ArrayRef<mlir::Value> extents,
531+
mlir::ValueRange oneBasedIndices,
532+
mlir::ArrayRef<mlir::Value> typeParams);
520533
} // namespace hlfir
521534

522535
#endif // FORTRAN_OPTIMIZER_BUILDER_HLFIRTOOLS_H

flang/lib/Optimizer/Builder/HLFIRTools.cpp

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1535,3 +1535,52 @@ hlfir::genExtentsVector(mlir::Location loc, fir::FirOpBuilder &builder,
15351535
shape.getDefiningOp()->erase();
15361536
return extents;
15371537
}
1538+
1539+
hlfir::Entity hlfir::gen1DSection(mlir::Location loc,
1540+
fir::FirOpBuilder &builder,
1541+
hlfir::Entity array, int64_t dim,
1542+
mlir::ArrayRef<mlir::Value> lbounds,
1543+
mlir::ArrayRef<mlir::Value> extents,
1544+
mlir::ValueRange oneBasedIndices,
1545+
mlir::ArrayRef<mlir::Value> typeParams) {
1546+
assert(array.isVariable() && "array must be a variable");
1547+
assert(dim > 0 && dim <= array.getRank() && "invalid dim number");
1548+
mlir::Value one =
1549+
builder.createIntegerConstant(loc, builder.getIndexType(), 1);
1550+
hlfir::DesignateOp::Subscripts subscripts;
1551+
unsigned indexId = 0;
1552+
for (int i = 0; i < array.getRank(); ++i) {
1553+
if (i == dim - 1) {
1554+
mlir::Value ubound = genUBound(loc, builder, lbounds[i], extents[i], one);
1555+
subscripts.emplace_back(
1556+
hlfir::DesignateOp::Triplet{lbounds[i], ubound, one});
1557+
} else {
1558+
mlir::Value index =
1559+
genUBound(loc, builder, lbounds[i], oneBasedIndices[indexId++], one);
1560+
subscripts.emplace_back(index);
1561+
}
1562+
}
1563+
mlir::Value sectionShape =
1564+
builder.create<fir::ShapeOp>(loc, extents[dim - 1]);
1565+
1566+
// The result type is one of:
1567+
// !fir.box/class<!fir.array<NxT>>
1568+
// !fir.box/class<!fir.array<?xT>>
1569+
//
1570+
// We could use !fir.ref<!fir.array<NxT>> when the whole dimension's
1571+
// size is known and it is the leading dimension, but let it be simple
1572+
// for the time being.
1573+
auto seqType =
1574+
mlir::cast<fir::SequenceType>(array.getElementOrSequenceType());
1575+
int64_t dimExtent = seqType.getShape()[dim - 1];
1576+
mlir::Type sectionType =
1577+
fir::SequenceType::get({dimExtent}, seqType.getEleTy());
1578+
sectionType = fir::wrapInClassOrBoxType(sectionType, array.isPolymorphic());
1579+
1580+
auto designate = builder.create<hlfir::DesignateOp>(
1581+
loc, sectionType, array, /*component=*/"", /*componentShape=*/nullptr,
1582+
subscripts,
1583+
/*substring=*/mlir::ValueRange{}, /*complexPartAttr=*/std::nullopt,
1584+
sectionShape, typeParams);
1585+
return hlfir::Entity{designate.getResult()};
1586+
}

0 commit comments

Comments
 (0)