Skip to content

Commit ab43fd1

Browse files
committed
[flang][do concurrent] Re-model reduce to match reductions are modelled in OpenMP and OpenACC
This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm#145837 (this one) - llvm#146025 - llvm#146028 - llvm#146033
1 parent 273cc3d commit ab43fd1

File tree

19 files changed

+794
-319
lines changed

19 files changed

+794
-319
lines changed

flang/include/flang/Lower/OpenMP/Clauses.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ using IteratorSpecifier = tomp::type::IteratorSpecifierT<TypeTy, IdTy, ExprTy>;
179179
using DefinedOperator = tomp::type::DefinedOperatorT<IdTy, ExprTy>;
180180
using ProcedureDesignator = tomp::type::ProcedureDesignatorT<IdTy, ExprTy>;
181181
using ReductionOperator = tomp::type::ReductionIdentifierT<IdTy, ExprTy>;
182+
using ReductionOperatorList = List<ReductionOperator>;
182183
using DependenceType = tomp::type::DependenceType;
183184
using Prescriptiveness = tomp::type::Prescriptiveness;
184185

flang/include/flang/Optimizer/Dialect/FIRAttr.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def fir_ReduceOperationEnum : I32BitEnumAttr<"ReduceOperationEnum",
112112
I32BitEnumAttrCaseBit<"MIN", 7, "min">,
113113
I32BitEnumAttrCaseBit<"IAND", 8, "iand">,
114114
I32BitEnumAttrCaseBit<"IOR", 9, "ior">,
115-
I32BitEnumAttrCaseBit<"EIOR", 10, "eior">
115+
I32BitEnumAttrCaseBit<"IEOR", 10, "ieor">
116116
]> {
117117
let separator = ", ";
118118
let cppNamespace = "::fir";

flang/include/flang/Optimizer/Dialect/FIROps.td

Lines changed: 128 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3518,7 +3518,7 @@ def fir_BoxTotalElementsOp
35183518

35193519
def YieldOp : fir_Op<"yield",
35203520
[Pure, ReturnLike, Terminator,
3521-
ParentOneOf<["LocalitySpecifierOp"]>]> {
3521+
ParentOneOf<["LocalitySpecifierOp", "DeclareReductionOp"]>]> {
35223522
let summary = "loop yield and termination operation";
35233523
let description = [{
35243524
"fir.yield" yields SSA values from a fir dialect op region and
@@ -3656,6 +3656,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
36563656
let hasRegionVerifier = 1;
36573657
}
36583658

3659+
def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
3660+
Symbol]> {
3661+
let summary = "declares a reduction kind";
3662+
let description = [{
3663+
Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
3664+
duplication at the moment. TODO Combine both ops into one. See:
3665+
https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
3666+
3667+
Declares a `do concurrent` reduction. This requires two mandatory and three
3668+
optional regions.
3669+
3670+
1. The optional alloc region specifies how to allocate the thread-local
3671+
reduction value. This region should not contain control flow and all
3672+
IR should be suitable for inlining straight into an entry block. In
3673+
the common case this is expected to contain only allocas. It is
3674+
expected to `fir.yield` the allocated value on all control paths.
3675+
If allocation is conditional (e.g. only allocate if the mold is
3676+
allocated), this should be done in the initilizer region and this
3677+
region not included. The alloc region is not used for by-value
3678+
reductions (where allocation is implicit).
3679+
2. The initializer region specifies how to initialize the thread-local
3680+
reduction value. This is usually the neutral element of the reduction.
3681+
For convenience, the region has an argument that contains the value
3682+
of the reduction accumulator at the start of the reduction. If an alloc
3683+
region is specified, there is a second block argument containing the
3684+
address of the allocated memory. The initializer region is expected to
3685+
`fir.yield` the new value on all control flow paths.
3686+
3. The reduction region specifies how to combine two values into one, i.e.
3687+
the reduction operator. It accepts the two values as arguments and is
3688+
expected to `fir.yield` the combined value on all control flow paths.
3689+
4. The atomic reduction region is optional and specifies how two values
3690+
can be combined atomically given local accumulator variables. It is
3691+
expected to store the combined value in the first accumulator variable.
3692+
5. The cleanup region is optional and specifies how to clean up any memory
3693+
allocated by the initializer region. The region has an argument that
3694+
contains the value of the thread-local reduction accumulator. This will
3695+
be executed after the reduction has completed.
3696+
3697+
Note that the MLIR type system does not allow for type-polymorphic
3698+
reductions. Separate reduction declarations should be created for different
3699+
element and accumulator types.
3700+
3701+
For initializer and reduction regions, the operand to `fir.yield` must
3702+
match the parent operation's results.
3703+
}];
3704+
3705+
let arguments = (ins SymbolNameAttr:$sym_name,
3706+
TypeAttr:$type);
3707+
3708+
let regions = (region MaxSizedRegion<1>:$allocRegion,
3709+
AnyRegion:$initializerRegion,
3710+
AnyRegion:$reductionRegion,
3711+
AnyRegion:$atomicReductionRegion,
3712+
AnyRegion:$cleanupRegion);
3713+
3714+
let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
3715+
"( `alloc` $allocRegion^ )? "
3716+
"`init` $initializerRegion "
3717+
"`combiner` $reductionRegion "
3718+
"( `atomic` $atomicReductionRegion^ )? "
3719+
"( `cleanup` $cleanupRegion^ )? ";
3720+
3721+
let extraClassDeclaration = [{
3722+
mlir::BlockArgument getAllocMoldArg() {
3723+
auto &region = getAllocRegion();
3724+
return region.empty() ? nullptr : region.getArgument(0);
3725+
}
3726+
mlir::BlockArgument getInitializerMoldArg() {
3727+
return getInitializerRegion().getArgument(0);
3728+
}
3729+
mlir::BlockArgument getInitializerAllocArg() {
3730+
return getAllocRegion().empty() ?
3731+
nullptr : getInitializerRegion().getArgument(1);
3732+
}
3733+
mlir::BlockArgument getReductionLhsArg() {
3734+
return getReductionRegion().getArgument(0);
3735+
}
3736+
mlir::BlockArgument getReductionRhsArg() {
3737+
return getReductionRegion().getArgument(1);
3738+
}
3739+
mlir::BlockArgument getAtomicReductionLhsArg() {
3740+
auto &region = getAtomicReductionRegion();
3741+
return region.empty() ? nullptr : region.getArgument(0);
3742+
}
3743+
mlir::BlockArgument getAtomicReductionRhsArg() {
3744+
auto &region = getAtomicReductionRegion();
3745+
return region.empty() ? nullptr : region.getArgument(1);
3746+
}
3747+
mlir::BlockArgument getCleanupAllocArg() {
3748+
auto &region = getCleanupRegion();
3749+
return region.empty() ? nullptr : region.getArgument(0);
3750+
}
3751+
}];
3752+
3753+
let hasRegionVerifier = 1;
3754+
}
3755+
36593756
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
36603757
[SingleBlock, AutomaticAllocationScope]> {
36613758
let summary = "do concurrent loop wrapper";
@@ -3694,6 +3791,25 @@ def fir_LocalSpecifier {
36943791
);
36953792
}
36963793

3794+
def fir_ReduceSpecifier {
3795+
dag arguments = (ins
3796+
Variadic<AnyType>:$reduce_vars,
3797+
OptionalAttr<DenseBoolArrayAttr>:$reduce_byref,
3798+
3799+
// This introduces redundency in how reductions are modelled. In particular,
3800+
// a single reduction is represented by 2 attributes:
3801+
//
3802+
// 1. `$reduce_syms` which is a list of `DeclareReductionOp`s.
3803+
// 2. `$reduce_attrs` which is an array of `fir::ReduceAttr` values.
3804+
//
3805+
// The first makes it easier to map `do concurrent` to parallization models
3806+
// (e.g. OpenMP and OpenACC) while the second makes it easier to map it to
3807+
// nests of `fir.do_loop ... unodered` ops.
3808+
OptionalAttr<SymbolRefArrayAttr>:$reduce_syms,
3809+
OptionalAttr<ArrayAttr>:$reduce_attrs
3810+
);
3811+
}
3812+
36973813
def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
36983814
[AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface,
36993815
["getLoopInductionVars"]>,
@@ -3703,7 +3819,7 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37033819
let description = [{
37043820
An operation that models a Fortran `do concurrent` loop's header and block.
37053821
This is a single-region single-block terminator op that is expected to
3706-
terminate the region of a `omp.do_concurrent` wrapper op.
3822+
terminate the region of a `fir.do_concurrent` wrapper op.
37073823

37083824
This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
37093825
`scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
@@ -3741,8 +3857,6 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37413857
- `lowerBound`: The group of SSA values for the nest's lower bounds.
37423858
- `upperBound`: The group of SSA values for the nest's upper bounds.
37433859
- `step`: The group of SSA values for the nest's steps.
3744-
- `reduceOperands`: The reduction SSA values, if any.
3745-
- `reduceAttrs`: Attributes to store reduction operations, if any.
37463860
- `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
37473861
LLVM.
37483862
}];
@@ -3751,12 +3865,12 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37513865
Variadic<Index>:$lowerBound,
37523866
Variadic<Index>:$upperBound,
37533867
Variadic<Index>:$step,
3754-
Variadic<AnyType>:$reduceOperands,
3755-
OptionalAttr<ArrayAttr>:$reduceAttrs,
37563868
OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
37573869
);
37583870

3759-
let arguments = !con(opArgs, fir_LocalSpecifier.arguments);
3871+
let arguments = !con(opArgs,
3872+
fir_LocalSpecifier.arguments,
3873+
fir_ReduceSpecifier.arguments);
37603874

37613875
let regions = (region SizedRegion<1>:$region);
37623876

@@ -3777,12 +3891,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
37773891
getNumLocalOperands());
37783892
}
37793893

3894+
mlir::Block::BlockArgListType getRegionReduceArgs() {
3895+
return getBody()->getArguments().slice(getNumInductionVars()
3896+
+ getNumLocalOperands(),
3897+
getNumReduceOperands());
3898+
}
3899+
37803900
/// Number of operands controlling the loop
37813901
unsigned getNumControlOperands() { return getLowerBound().size() * 3; }
37823902

37833903
// Get Number of reduction operands
37843904
unsigned getNumReduceOperands() {
3785-
return getReduceOperands().size();
3905+
return getReduceVars().size();
37863906
}
37873907

37883908
mlir::Operation::operand_range getLocalOperands() {

flang/lib/Lower/Bridge.cpp

Lines changed: 61 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
#include "flang/Lower/Bridge.h"
1414

15+
#include "OpenMP/ReductionProcessor.h"
1516
#include "flang/Lower/Allocatable.h"
1617
#include "flang/Lower/CallInterface.h"
1718
#include "flang/Lower/Coarray.h"
@@ -127,9 +128,8 @@ struct IncrementLoopInfo {
127128
bool isConcurrent;
128129
llvm::SmallVector<const Fortran::semantics::Symbol *> localSymList;
129130
llvm::SmallVector<const Fortran::semantics::Symbol *> localInitSymList;
130-
llvm::SmallVector<
131-
std::pair<fir::ReduceOperationEnum, const Fortran::semantics::Symbol *>>
132-
reduceSymList;
131+
llvm::SmallVector<const Fortran::semantics::Symbol *> reduceSymList;
132+
llvm::SmallVector<fir::ReduceOperationEnum> reduceOperatorList;
133133
llvm::SmallVector<const Fortran::semantics::Symbol *> sharedSymList;
134134
mlir::Value loopVariable = nullptr;
135135

@@ -1997,7 +1997,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
19971997
case Fortran::parser::ReductionOperator::Operator::Ior:
19981998
return fir::ReduceOperationEnum::IOR;
19991999
case Fortran::parser::ReductionOperator::Operator::Ieor:
2000-
return fir::ReduceOperationEnum::EIOR;
2000+
return fir::ReduceOperationEnum::IEOR;
20012001
}
20022002
llvm_unreachable("illegal reduction operator");
20032003
}
@@ -2031,8 +2031,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
20312031
std::get<Fortran::parser::ReductionOperator>(reduceList->t));
20322032
for (const Fortran::parser::Name &x :
20332033
std::get<std::list<Fortran::parser::Name>>(reduceList->t)) {
2034-
info.reduceSymList.push_back(
2035-
std::make_pair(reduce_operation, x.symbol));
2034+
info.reduceSymList.push_back(x.symbol);
2035+
info.reduceOperatorList.push_back(reduce_operation);
20362036
}
20372037
}
20382038
}
@@ -2093,6 +2093,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
20932093
assign.u = Fortran::evaluate::Assignment::BoundsSpec{};
20942094
genAssignment(assign);
20952095
}
2096+
20962097
for (const Fortran::semantics::Symbol *sym : info.sharedSymList) {
20972098
const auto *hostDetails =
20982099
sym->detailsIf<Fortran::semantics::HostAssocDetails>();
@@ -2116,6 +2117,45 @@ class FirConverter : public Fortran::lower::AbstractConverter {
21162117
}
21172118
}
21182119

2120+
llvm::SmallVector<bool> reduceVarByRef;
2121+
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
2122+
llvm::SmallVector<mlir::Attribute> nestReduceAttrs;
2123+
2124+
for (const auto &reduceOp : info.reduceOperatorList)
2125+
nestReduceAttrs.push_back(
2126+
fir::ReduceAttr::get(builder->getContext(), reduceOp));
2127+
2128+
llvm::SmallVector<mlir::Value> reduceVars;
2129+
Fortran::lower::omp::ReductionProcessor rp;
2130+
rp.processReductionArguments<fir::DeclareReductionOp>(
2131+
toLocation(), *this, info.reduceOperatorList, reduceVars,
2132+
reduceVarByRef, reductionDeclSymbols, info.reduceSymList);
2133+
2134+
doConcurrentLoopOp.getReduceVarsMutable().assign(reduceVars);
2135+
doConcurrentLoopOp.setReduceSymsAttr(
2136+
reductionDeclSymbols.empty()
2137+
? nullptr
2138+
: mlir::ArrayAttr::get(builder->getContext(),
2139+
reductionDeclSymbols));
2140+
doConcurrentLoopOp.setReduceAttrsAttr(
2141+
nestReduceAttrs.empty()
2142+
? nullptr
2143+
: mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs));
2144+
doConcurrentLoopOp.setReduceByrefAttr(
2145+
reduceVarByRef.empty() ? nullptr
2146+
: mlir::DenseBoolArrayAttr::get(
2147+
builder->getContext(), reduceVarByRef));
2148+
2149+
for (auto [sym, reduceVar] :
2150+
llvm::zip_equal(info.reduceSymList, reduceVars)) {
2151+
auto arg = doConcurrentLoopOp.getRegion().begin()->addArgument(
2152+
reduceVar.getType(), doConcurrentLoopOp.getLoc());
2153+
bindSymbol(*sym, hlfir::translateToExtendedValue(
2154+
reduceVar.getLoc(), *builder, hlfir::Entity{arg},
2155+
/*contiguousHint=*/true)
2156+
.first);
2157+
}
2158+
21192159
// Note that allocatable, types with ultimate components, and type
21202160
// requiring finalization are forbidden in LOCAL/LOCAL_INIT (F2023 C1130),
21212161
// so no clean-up needs to be generated for these entities.
@@ -2207,6 +2247,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
22072247
}
22082248
}
22092249

2250+
// Introduce a `do concurrent` scope to bind symbols corresponding to local,
2251+
// local_init, and reduce region arguments.
2252+
if (!incrementLoopNestInfo.empty() &&
2253+
incrementLoopNestInfo.back().isConcurrent)
2254+
localSymbols.pushScope();
2255+
22102256
// Increment loop begin code. (Infinite/while code was already generated.)
22112257
if (!infiniteLoop && !whileCondition)
22122258
genFIRIncrementLoopBegin(incrementLoopNestInfo, doStmtEval.dirs);
@@ -2230,6 +2276,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
22302276

22312277
// This call may generate a branch in some contexts.
22322278
genFIR(endDoEval, unstructuredContext);
2279+
2280+
if (!incrementLoopNestInfo.empty() &&
2281+
incrementLoopNestInfo.back().isConcurrent)
2282+
localSymbols.popScope();
22332283
}
22342284

22352285
/// Generate FIR to evaluate loop control values (lower, upper and step).
@@ -2412,19 +2462,6 @@ class FirConverter : public Fortran::lower::AbstractConverter {
24122462
info.stepVariable = builder->createTemporary(loc, stepValue.getType());
24132463
builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
24142464
}
2415-
2416-
if (genDoConcurrent && nestReduceOperands.empty()) {
2417-
// Create DO CONCURRENT reduce operands and attributes
2418-
for (const auto &reduceSym : info.reduceSymList) {
2419-
const fir::ReduceOperationEnum reduceOperation = reduceSym.first;
2420-
const Fortran::semantics::Symbol *sym = reduceSym.second;
2421-
fir::ExtendedValue exv = getSymbolExtendedValue(*sym, nullptr);
2422-
nestReduceOperands.push_back(fir::getBase(exv));
2423-
auto reduceAttr =
2424-
fir::ReduceAttr::get(builder->getContext(), reduceOperation);
2425-
nestReduceAttrs.push_back(reduceAttr);
2426-
}
2427-
}
24282465
}
24292466

24302467
for (auto [info, lowerValue, upperValue, stepValue] :
@@ -2522,11 +2559,11 @@ class FirConverter : public Fortran::lower::AbstractConverter {
25222559

25232560
builder->setInsertionPointToEnd(loopWrapperOp.getBody());
25242561
auto loopOp = builder->create<fir::DoConcurrentLoopOp>(
2525-
loc, nestLBs, nestUBs, nestSts, nestReduceOperands,
2526-
nestReduceAttrs.empty()
2527-
? nullptr
2528-
: mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs),
2529-
nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr);
2562+
loc, nestLBs, nestUBs, nestSts, /*loopAnnotation=*/nullptr,
2563+
/*local_vars=*/std::nullopt,
2564+
/*local_syms=*/nullptr, /*reduce_vars=*/std::nullopt,
2565+
/*reduce_byref=*/nullptr, /*reduce_syms=*/nullptr,
2566+
/*reduce_attrs=*/nullptr);
25302567

25312568
llvm::SmallVector<mlir::Type> loopBlockArgTypes(
25322569
incrementLoopNestInfo.size(), builder->getIndexType());

0 commit comments

Comments
 (0)