Skip to content

Commit 147c727

Browse files
ergawygithub-actions[bot]
authored andcommitted
Automerge: [flang][do concurrent] Re-model reduce to match reductions are modelled in OpenMP and OpenACC (#145837)
This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm/llvm-project#145837 (this one) - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033
2 parents 50869ee + eba35cc commit 147c727

File tree

19 files changed

+795
-319
lines changed

19 files changed

+795
-319
lines changed

flang/include/flang/Optimizer/Dialect/FIRAttr.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def fir_ReduceOperationEnum : I32BitEnumAttr<"ReduceOperationEnum",
112112
I32BitEnumAttrCaseBit<"MIN", 7, "min">,
113113
I32BitEnumAttrCaseBit<"IAND", 8, "iand">,
114114
I32BitEnumAttrCaseBit<"IOR", 9, "ior">,
115-
I32BitEnumAttrCaseBit<"EIOR", 10, "eior">
115+
I32BitEnumAttrCaseBit<"IEOR", 10, "ieor">
116116
]> {
117117
let separator = ", ";
118118
let cppNamespace = "::fir";

flang/include/flang/Optimizer/Dialect/FIROps.td

Lines changed: 128 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3518,7 +3518,7 @@ def fir_BoxTotalElementsOp
35183518

35193519
def YieldOp : fir_Op<"yield",
35203520
[Pure, ReturnLike, Terminator,
3521-
ParentOneOf<["LocalitySpecifierOp"]>]> {
3521+
ParentOneOf<["LocalitySpecifierOp", "DeclareReductionOp"]>]> {
35223522
let summary = "loop yield and termination operation";
35233523
let description = [{
35243524
"fir.yield" yields SSA values from a fir dialect op region and
@@ -3656,6 +3656,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
36563656
let hasRegionVerifier = 1;
36573657
}
36583658

3659+
def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
3660+
Symbol]> {
3661+
let summary = "declares a reduction kind";
3662+
let description = [{
3663+
Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
3664+
duplication at the moment. TODO Combine both ops into one. See:
3665+
https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
3666+
3667+
Declares a `do concurrent` reduction. This requires two mandatory and three
3668+
optional regions.
3669+
3670+
1. The optional alloc region specifies how to allocate the thread-local
3671+
reduction value. This region should not contain control flow and all
3672+
IR should be suitable for inlining straight into an entry block. In
3673+
the common case this is expected to contain only allocas. It is
3674+
expected to `fir.yield` the allocated value on all control paths.
3675+
If allocation is conditional (e.g. only allocate if the mold is
3676+
allocated), this should be done in the initilizer region and this
3677+
region not included. The alloc region is not used for by-value
3678+
reductions (where allocation is implicit).
3679+
2. The initializer region specifies how to initialize the thread-local
3680+
reduction value. This is usually the neutral element of the reduction.
3681+
For convenience, the region has an argument that contains the value
3682+
of the reduction accumulator at the start of the reduction. If an alloc
3683+
region is specified, there is a second block argument containing the
3684+
address of the allocated memory. The initializer region is expected to
3685+
`fir.yield` the new value on all control flow paths.
3686+
3. The reduction region specifies how to combine two values into one, i.e.
3687+
the reduction operator. It accepts the two values as arguments and is
3688+
expected to `fir.yield` the combined value on all control flow paths.
3689+
4. The atomic reduction region is optional and specifies how two values
3690+
can be combined atomically given local accumulator variables. It is
3691+
expected to store the combined value in the first accumulator variable.
3692+
5. The cleanup region is optional and specifies how to clean up any memory
3693+
allocated by the initializer region. The region has an argument that
3694+
contains the value of the thread-local reduction accumulator. This will
3695+
be executed after the reduction has completed.
3696+
3697+
Note that the MLIR type system does not allow for type-polymorphic
3698+
reductions. Separate reduction declarations should be created for different
3699+
element and accumulator types.
3700+
3701+
For initializer and reduction regions, the operand to `fir.yield` must
3702+
match the parent operation's results.
3703+
}];
3704+
3705+
let arguments = (ins SymbolNameAttr:$sym_name,
3706+
TypeAttr:$type);
3707+
3708+
let regions = (region MaxSizedRegion<1>:$allocRegion,
3709+
AnyRegion:$initializerRegion,
3710+
AnyRegion:$reductionRegion,
3711+
AnyRegion:$atomicReductionRegion,
3712+
AnyRegion:$cleanupRegion);
3713+
3714+
let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
3715+
"( `alloc` $allocRegion^ )? "
3716+
"`init` $initializerRegion "
3717+
"`combiner` $reductionRegion "
3718+
"( `atomic` $atomicReductionRegion^ )? "
3719+
"( `cleanup` $cleanupRegion^ )? ";
3720+
3721+
let extraClassDeclaration = [{
3722+
mlir::BlockArgument getAllocMoldArg() {
3723+
auto &region = getAllocRegion();
3724+
return region.empty() ? nullptr : region.getArgument(0);
3725+
}
3726+
mlir::BlockArgument getInitializerMoldArg() {
3727+
return getInitializerRegion().getArgument(0);
3728+
}
3729+
mlir::BlockArgument getInitializerAllocArg() {
3730+
return getAllocRegion().empty() ?
3731+
nullptr : getInitializerRegion().getArgument(1);
3732+
}
3733+
mlir::BlockArgument getReductionLhsArg() {
3734+
return getReductionRegion().getArgument(0);
3735+
}
3736+
mlir::BlockArgument getReductionRhsArg() {
3737+
return getReductionRegion().getArgument(1);
3738+
}
3739+
mlir::BlockArgument getAtomicReductionLhsArg() {
3740+
auto &region = getAtomicReductionRegion();
3741+
return region.empty() ? nullptr : region.getArgument(0);
3742+
}
3743+
mlir::BlockArgument getAtomicReductionRhsArg() {
3744+
auto &region = getAtomicReductionRegion();
3745+
return region.empty() ? nullptr : region.getArgument(1);
3746+
}
3747+
mlir::BlockArgument getCleanupAllocArg() {
3748+
auto &region = getCleanupRegion();
3749+
return region.empty() ? nullptr : region.getArgument(0);
3750+
}
3751+
}];
3752+
3753+
let hasRegionVerifier = 1;
3754+
}
3755+
36593756
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
36603757
[SingleBlock, AutomaticAllocationScope]> {
36613758
let summary = "do concurrent loop wrapper";
@@ -3694,6 +3791,25 @@ def fir_LocalSpecifier {
36943791
);
36953792
}
36963793

3794+
def fir_ReduceSpecifier {
3795+
dag arguments = (ins
3796+
Variadic<AnyType>:$reduce_vars,
3797+
OptionalAttr<DenseBoolArrayAttr>:$reduce_byref,
3798+
3799+
// This introduces redundency in how reductions are modelled. In particular,
3800+
// a single reduction is represented by 2 attributes:
3801+
//
3802+
// 1. `$reduce_syms` which is a list of `DeclareReductionOp`s.
3803+
// 2. `$reduce_attrs` which is an array of `fir::ReduceAttr` values.
3804+
//
3805+
// The first makes it easier to map `do concurrent` to parallization models
3806+
// (e.g. OpenMP and OpenACC) while the second makes it easier to map it to
3807+
// nests of `fir.do_loop ... unodered` ops.
3808+
OptionalAttr<SymbolRefArrayAttr>:$reduce_syms,
3809+
OptionalAttr<ArrayAttr>:$reduce_attrs
3810+
);
3811+
}
3812+
36973813
def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
36983814
[AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface,
36993815
["getLoopInductionVars"]>,
@@ -3703,7 +3819,7 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37033819
let description = [{
37043820
An operation that models a Fortran `do concurrent` loop's header and block.
37053821
This is a single-region single-block terminator op that is expected to
3706-
terminate the region of a `omp.do_concurrent` wrapper op.
3822+
terminate the region of a `fir.do_concurrent` wrapper op.
37073823

37083824
This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
37093825
`scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
@@ -3741,8 +3857,6 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37413857
- `lowerBound`: The group of SSA values for the nest's lower bounds.
37423858
- `upperBound`: The group of SSA values for the nest's upper bounds.
37433859
- `step`: The group of SSA values for the nest's steps.
3744-
- `reduceOperands`: The reduction SSA values, if any.
3745-
- `reduceAttrs`: Attributes to store reduction operations, if any.
37463860
- `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
37473861
LLVM.
37483862
}];
@@ -3751,12 +3865,12 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37513865
Variadic<Index>:$lowerBound,
37523866
Variadic<Index>:$upperBound,
37533867
Variadic<Index>:$step,
3754-
Variadic<AnyType>:$reduceOperands,
3755-
OptionalAttr<ArrayAttr>:$reduceAttrs,
37563868
OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
37573869
);
37583870

3759-
let arguments = !con(opArgs, fir_LocalSpecifier.arguments);
3871+
let arguments = !con(opArgs,
3872+
fir_LocalSpecifier.arguments,
3873+
fir_ReduceSpecifier.arguments);
37603874

37613875
let regions = (region SizedRegion<1>:$region);
37623876

@@ -3777,12 +3891,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37773891
getNumLocalOperands());
37783892
}
37793893

3894+
mlir::Block::BlockArgListType getRegionReduceArgs() {
3895+
return getBody()->getArguments().slice(getNumInductionVars()
3896+
+ getNumLocalOperands(),
3897+
getNumReduceOperands());
3898+
}
3899+
37803900
/// Number of operands controlling the loop
37813901
unsigned getNumControlOperands() { return getLowerBound().size() * 3; }
37823902

37833903
// Get Number of reduction operands
37843904
unsigned getNumReduceOperands() {
3785-
return getReduceOperands().size();
3905+
return getReduceVars().size();
37863906
}
37873907

37883908
mlir::Operation::operand_range getLocalOperands() {

flang/lib/Lower/Bridge.cpp

Lines changed: 61 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
#include "flang/Lower/Bridge.h"
1414

15+
#include "OpenMP/ReductionProcessor.h"
1516
#include "flang/Lower/Allocatable.h"
1617
#include "flang/Lower/CallInterface.h"
1718
#include "flang/Lower/Coarray.h"
@@ -127,9 +128,8 @@ struct IncrementLoopInfo {
127128
bool isConcurrent;
128129
llvm::SmallVector<const Fortran::semantics::Symbol *> localSymList;
129130
llvm::SmallVector<const Fortran::semantics::Symbol *> localInitSymList;
130-
llvm::SmallVector<
131-
std::pair<fir::ReduceOperationEnum, const Fortran::semantics::Symbol *>>
132-
reduceSymList;
131+
llvm::SmallVector<const Fortran::semantics::Symbol *> reduceSymList;
132+
llvm::SmallVector<fir::ReduceOperationEnum> reduceOperatorList;
133133
llvm::SmallVector<const Fortran::semantics::Symbol *> sharedSymList;
134134
mlir::Value loopVariable = nullptr;
135135

@@ -1993,7 +1993,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
19931993
case Fortran::parser::ReductionOperator::Operator::Ior:
19941994
return fir::ReduceOperationEnum::IOR;
19951995
case Fortran::parser::ReductionOperator::Operator::Ieor:
1996-
return fir::ReduceOperationEnum::EIOR;
1996+
return fir::ReduceOperationEnum::IEOR;
19971997
}
19981998
llvm_unreachable("illegal reduction operator");
19991999
}
@@ -2027,8 +2027,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
20272027
std::get<Fortran::parser::ReductionOperator>(reduceList->t));
20282028
for (const Fortran::parser::Name &x :
20292029
std::get<std::list<Fortran::parser::Name>>(reduceList->t)) {
2030-
info.reduceSymList.push_back(
2031-
std::make_pair(reduce_operation, x.symbol));
2030+
info.reduceSymList.push_back(x.symbol);
2031+
info.reduceOperatorList.push_back(reduce_operation);
20322032
}
20332033
}
20342034
}
@@ -2089,6 +2089,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
20892089
assign.u = Fortran::evaluate::Assignment::BoundsSpec{};
20902090
genAssignment(assign);
20912091
}
2092+
20922093
for (const Fortran::semantics::Symbol *sym : info.sharedSymList) {
20932094
const auto *hostDetails =
20942095
sym->detailsIf<Fortran::semantics::HostAssocDetails>();
@@ -2112,6 +2113,45 @@ class FirConverter : public Fortran::lower::AbstractConverter {
21122113
}
21132114
}
21142115

2116+
llvm::SmallVector<bool> reduceVarByRef;
2117+
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
2118+
llvm::SmallVector<mlir::Attribute> nestReduceAttrs;
2119+
2120+
for (const auto &reduceOp : info.reduceOperatorList)
2121+
nestReduceAttrs.push_back(
2122+
fir::ReduceAttr::get(builder->getContext(), reduceOp));
2123+
2124+
llvm::SmallVector<mlir::Value> reduceVars;
2125+
Fortran::lower::omp::ReductionProcessor rp;
2126+
rp.processReductionArguments<fir::DeclareReductionOp>(
2127+
toLocation(), *this, info.reduceOperatorList, reduceVars,
2128+
reduceVarByRef, reductionDeclSymbols, info.reduceSymList);
2129+
2130+
doConcurrentLoopOp.getReduceVarsMutable().assign(reduceVars);
2131+
doConcurrentLoopOp.setReduceSymsAttr(
2132+
reductionDeclSymbols.empty()
2133+
? nullptr
2134+
: mlir::ArrayAttr::get(builder->getContext(),
2135+
reductionDeclSymbols));
2136+
doConcurrentLoopOp.setReduceAttrsAttr(
2137+
nestReduceAttrs.empty()
2138+
? nullptr
2139+
: mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs));
2140+
doConcurrentLoopOp.setReduceByrefAttr(
2141+
reduceVarByRef.empty() ? nullptr
2142+
: mlir::DenseBoolArrayAttr::get(
2143+
builder->getContext(), reduceVarByRef));
2144+
2145+
for (auto [sym, reduceVar] :
2146+
llvm::zip_equal(info.reduceSymList, reduceVars)) {
2147+
auto arg = doConcurrentLoopOp.getRegion().begin()->addArgument(
2148+
reduceVar.getType(), doConcurrentLoopOp.getLoc());
2149+
bindSymbol(*sym, hlfir::translateToExtendedValue(
2150+
reduceVar.getLoc(), *builder, hlfir::Entity{arg},
2151+
/*contiguousHint=*/true)
2152+
.first);
2153+
}
2154+
21152155
// Note that allocatable, types with ultimate components, and type
21162156
// requiring finalization are forbidden in LOCAL/LOCAL_INIT (F2023 C1130),
21172157
// so no clean-up needs to be generated for these entities.
@@ -2203,6 +2243,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
22032243
}
22042244
}
22052245

2246+
// Introduce a `do concurrent` scope to bind symbols corresponding to local,
2247+
// local_init, and reduce region arguments.
2248+
if (!incrementLoopNestInfo.empty() &&
2249+
incrementLoopNestInfo.back().isConcurrent)
2250+
localSymbols.pushScope();
2251+
22062252
// Increment loop begin code. (Infinite/while code was already generated.)
22072253
if (!infiniteLoop && !whileCondition)
22082254
genFIRIncrementLoopBegin(incrementLoopNestInfo, doStmtEval.dirs);
@@ -2226,6 +2272,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
22262272

22272273
// This call may generate a branch in some contexts.
22282274
genFIR(endDoEval, unstructuredContext);
2275+
2276+
if (!incrementLoopNestInfo.empty() &&
2277+
incrementLoopNestInfo.back().isConcurrent)
2278+
localSymbols.popScope();
22292279
}
22302280

22312281
/// Generate FIR to evaluate loop control values (lower, upper and step).
@@ -2408,19 +2458,6 @@ class FirConverter : public Fortran::lower::AbstractConverter {
24082458
info.stepVariable = builder->createTemporary(loc, stepValue.getType());
24092459
builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
24102460
}
2411-
2412-
if (genDoConcurrent && nestReduceOperands.empty()) {
2413-
// Create DO CONCURRENT reduce operands and attributes
2414-
for (const auto &reduceSym : info.reduceSymList) {
2415-
const fir::ReduceOperationEnum reduceOperation = reduceSym.first;
2416-
const Fortran::semantics::Symbol *sym = reduceSym.second;
2417-
fir::ExtendedValue exv = getSymbolExtendedValue(*sym, nullptr);
2418-
nestReduceOperands.push_back(fir::getBase(exv));
2419-
auto reduceAttr =
2420-
fir::ReduceAttr::get(builder->getContext(), reduceOperation);
2421-
nestReduceAttrs.push_back(reduceAttr);
2422-
}
2423-
}
24242461
}
24252462

24262463
for (auto [info, lowerValue, upperValue, stepValue] :
@@ -2518,11 +2555,11 @@ class FirConverter : public Fortran::lower::AbstractConverter {
25182555

25192556
builder->setInsertionPointToEnd(loopWrapperOp.getBody());
25202557
auto loopOp = builder->create<fir::DoConcurrentLoopOp>(
2521-
loc, nestLBs, nestUBs, nestSts, nestReduceOperands,
2522-
nestReduceAttrs.empty()
2523-
? nullptr
2524-
: mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs),
2525-
nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr);
2558+
loc, nestLBs, nestUBs, nestSts, /*loopAnnotation=*/nullptr,
2559+
/*local_vars=*/std::nullopt,
2560+
/*local_syms=*/nullptr, /*reduce_vars=*/std::nullopt,
2561+
/*reduce_byref=*/nullptr, /*reduce_syms=*/nullptr,
2562+
/*reduce_attrs=*/nullptr);
25262563

25272564
llvm::SmallVector<mlir::Type> loopBlockArgTypes(
25282565
incrementLoopNestInfo.size(), builder->getIndexType());

0 commit comments

Comments
 (0)