-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[Flang][MLIR] Add !$omp unroll
and omp.unroll_heuristic
#144785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/meinersbur/flang_canonical-loop_ops-lowering
Are you sure you want to change the base?
[Flang][MLIR] Add !$omp unroll
and omp.unroll_heuristic
#144785
Conversation
6a45873
to
fed2aa7
Compare
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-parser Author: Michael Kruse (Meinersbur) ChangesAdd support for First step to add
This patch is functional end-to-end and adds support for This is a continuation of #71712 and a cherry-pick of a more comprehensive patch (but taking too long) that I was working on. Currently unroll only works standalone and cannot yet combined with other directive. The following features still have to be implemented:
PR #143715 adds support for the tile construct if it is the only transformation applied to a loop nest before a non-transformation loop-associated construct. It does to by adding a Related:
Patch is 77.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144785.diff 24 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 82673f0948a5b..3a8c7dcb0690a 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2128,6 +2128,161 @@ genLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
return loopOp;
}
+static mlir::omp::CanonicalLoopOp
+genCanonicalLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+ semantics::SemanticsContext &semaCtx,
+ lower::pft::Evaluation &eval, mlir::Location loc,
+ const ConstructQueue &queue,
+ ConstructQueue::const_iterator item,
+ llvm::ArrayRef<const semantics::Symbol *> ivs,
+ llvm::omp::Directive directive, DataSharingProcessor &dsp) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ assert(ivs.size() == 1 && "Nested loops not yet implemented");
+ const semantics::Symbol *iv = ivs[0];
+
+ auto &nestedEval = eval.getFirstNestedEvaluation();
+ if (nestedEval.getIf<parser::DoConstruct>()->IsDoConcurrent()) {
+ TODO(loc, "Do Concurrent in unroll construct");
+ }
+
+ // Get the loop bounds (and increment)
+ auto &doLoopEval = nestedEval.getFirstNestedEvaluation();
+ auto *doStmt = doLoopEval.getIf<parser::NonLabelDoStmt>();
+ assert(doStmt && "Expected do loop to be in the nested evaluation");
+ auto &loopControl = std::get<std::optional<parser::LoopControl>>(doStmt->t);
+ assert(loopControl.has_value());
+ auto *bounds = std::get_if<parser::LoopControl::Bounds>(&loopControl->u);
+ assert(bounds && "Expected bounds for canonical loop");
+ lower::StatementContext stmtCtx;
+ mlir::Value loopLBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->lower), stmtCtx));
+ mlir::Value loopUBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->upper), stmtCtx));
+ mlir::Value loopStepVar = [&]() {
+ if (bounds->step) {
+ return fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->step), stmtCtx));
+ } else {
+ // If `step` is not present, assume it is `1`.
+ return firOpBuilder.createIntegerConstant(loc, firOpBuilder.getI32Type(),
+ 1);
+ }
+ }();
+
+ // Get the integer kind for the loop variable and cast the loop bounds
+ size_t loopVarTypeSize = bounds->name.thing.symbol->GetUltimate().size();
+ mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize);
+ loopLBVar = firOpBuilder.createConvert(loc, loopVarType, loopLBVar);
+ loopUBVar = firOpBuilder.createConvert(loc, loopVarType, loopUBVar);
+ loopStepVar = firOpBuilder.createConvert(loc, loopVarType, loopStepVar);
+
+ // Start lowering
+ mlir::Value zero = firOpBuilder.createIntegerConstant(loc, loopVarType, 0);
+ mlir::Value one = firOpBuilder.createIntegerConstant(loc, loopVarType, 1);
+ mlir::Value isDownwards = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, loopStepVar, zero);
+
+ // Ensure we are counting upwards. If not, negate step and swap lb and ub.
+ mlir::Value negStep =
+ firOpBuilder.create<mlir::arith::SubIOp>(loc, zero, loopStepVar);
+ mlir::Value incr = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, negStep, loopStepVar);
+ mlir::Value lb = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopUBVar, loopLBVar);
+ mlir::Value ub = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopLBVar, loopUBVar);
+
+ // Compute the trip count assuming lb <= ub. This guarantees that the result
+ // is non-negative and we can use unsigned arithmetic.
+ mlir::Value span = firOpBuilder.create<mlir::arith::SubIOp>(
+ loc, ub, lb, ::mlir::arith::IntegerOverflowFlags::nuw);
+ mlir::Value tcMinusOne =
+ firOpBuilder.create<mlir::arith::DivUIOp>(loc, span, incr);
+ mlir::Value tcIfLooping = firOpBuilder.create<mlir::arith::AddIOp>(
+ loc, tcMinusOne, one, ::mlir::arith::IntegerOverflowFlags::nuw);
+
+ // Fall back to 0 if lb > ub
+ mlir::Value isZeroTC = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, ub, lb);
+ mlir::Value tripcount = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isZeroTC, zero, tcIfLooping);
+
+ // Create the CLI handle.
+ auto newcli = firOpBuilder.create<mlir::omp::NewCliOp>(loc);
+ mlir::Value cli = newcli.getResult();
+
+ auto ivCallback = [&](mlir::Operation *op)
+ -> llvm::SmallVector<const Fortran::semantics::Symbol *> {
+ mlir::Region ®ion = op->getRegion(0);
+
+ // Create the op's region skeleton (BB taking the iv as argument)
+ firOpBuilder.createBlock(®ion, {}, {loopVarType}, {loc});
+
+ // Compute the value of the loop variable from the logical iteration number.
+ mlir::Value natIterNum = fir::getBase(region.front().getArgument(0));
+ mlir::Value scaled =
+ firOpBuilder.create<mlir::arith::MulIOp>(loc, natIterNum, loopStepVar);
+ mlir::Value userVal =
+ firOpBuilder.create<mlir::arith::AddIOp>(loc, loopLBVar, scaled);
+
+ // The argument is not currently in memory, so make a temporary for the
+ // argument, and store it there, then bind that location to the argument.
+ mlir::Operation *storeOp =
+ createAndSetPrivatizedLoopVar(converter, loc, userVal, iv);
+
+ firOpBuilder.setInsertionPointAfter(storeOp);
+ return {iv};
+ };
+
+ // Create the omp.canonical_loop operation
+ auto canonLoop = genOpWithBody<mlir::omp::CanonicalLoopOp>(
+ OpWithBodyGenInfo(converter, symTable, semaCtx, loc, nestedEval,
+ directive)
+ .setClauses(&item->clauses)
+ .setDataSharingProcessor(&dsp)
+ .setGenRegionEntryCb(ivCallback),
+ queue, item, tripcount, cli);
+
+ firOpBuilder.setInsertionPointAfter(canonLoop);
+ return canonLoop;
+}
+
+static void genUnrollOp(Fortran::lower::AbstractConverter &converter,
+ Fortran::lower::SymMap &symTable,
+ lower::StatementContext &stmtCtx,
+ Fortran::semantics::SemanticsContext &semaCtx,
+ Fortran::lower::pft::Evaluation &eval,
+ mlir::Location loc, const ConstructQueue &queue,
+ ConstructQueue::const_iterator item) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ mlir::omp::LoopRelatedClauseOps loopInfo;
+ llvm::SmallVector<const semantics::Symbol *> iv;
+ collectLoopRelatedInfo(converter, loc, eval, item->clauses, loopInfo, iv);
+
+ // Clauses for unrolling not yet implemnted
+ ClauseProcessor cp(converter, semaCtx, item->clauses);
+ cp.processTODO<clause::Partial, clause::Full>(
+ loc, llvm::omp::Directive::OMPD_unroll);
+
+ // Even though unroll does not support data-sharing clauses, but this is
+ // required to fill the symbol table.
+ DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
+ /*shouldCollectPreDeterminedSymbols=*/true,
+ /*useDelayedPrivatization=*/false, symTable);
+ dsp.processStep1();
+
+ // Emit the associated loop
+ auto canonLoop =
+ genCanonicalLoopOp(converter, symTable, semaCtx, eval, loc, queue, item,
+ iv, llvm::omp::Directive::OMPD_unroll, dsp);
+
+ // Apply unrolling to it
+ auto cli = canonLoop.getCli();
+ firOpBuilder.create<mlir::omp::UnrollHeuristicOp>(loc, cli);
+}
+
static mlir::omp::MaskedOp
genMaskedOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
lower::StatementContext &stmtCtx,
@@ -3516,12 +3671,9 @@ static void genOMPDispatch(lower::AbstractConverter &converter,
newOp = genTeamsOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue,
item);
break;
- case llvm::omp::Directive::OMPD_tile:
- case llvm::omp::Directive::OMPD_unroll: {
- unsigned version = semaCtx.langOptions().OpenMPVersion;
- TODO(loc, "Unhandled loop directive (" +
- llvm::omp::getOpenMPDirectiveName(dir, version) + ")");
- }
+ case llvm::omp::Directive::OMPD_unroll:
+ genUnrollOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue, item);
+ break;
// case llvm::omp::Directive::OMPD_workdistribute:
case llvm::omp::Directive::OMPD_workshare:
newOp = genWorkshareOp(converter, symTable, stmtCtx, semaCtx, eval, loc,
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic01.f90 b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
new file mode 100644
index 0000000000000..a5f5c003b8a7c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
@@ -0,0 +1,39 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic01(lb, ub, inc)
+ integer res, i, lb, ub, inc
+
+ !$omp unroll
+ do i = lb, ub, inc
+ res = i
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic01
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic01(
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %13 = arith.cmpi slt, %12, %c0_i32 : i32
+!CHECK-NEXT: %14 = arith.subi %c0_i32, %12 : i32
+!CHECK-NEXT: %15 = arith.select %13, %14, %12 : i32
+!CHECK-NEXT: %16 = arith.select %13, %11, %10 : i32
+!CHECK-NEXT: %17 = arith.select %13, %10, %11 : i32
+!CHECK-NEXT: %18 = arith.subi %17, %16 overflow<nuw> : i32
+!CHECK-NEXT: %19 = arith.divui %18, %15 : i32
+!CHECK-NEXT: %20 = arith.addi %19, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %21 = arith.cmpi slt, %17, %16 : i32
+!CHECK-NEXT: %22 = arith.select %21, %c0_i32, %20 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%22) {
+!CHECK-NEXT: %23 = arith.muli %iv, %12 : i32
+!CHECK-NEXT: %24 = arith.addi %10, %23 : i32
+!CHECK-NEXT: hlfir.assign %24 to %9#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %25 = fir.load %9#0 : !fir.ref<i32>
+!CHECK-NEXT: hlfir.assign %25 to %6#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic02.f90 b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
new file mode 100644
index 0000000000000..669f185f910c4
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
@@ -0,0 +1,70 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic_nested02(outer_lb, outer_ub, outer_inc, inner_lb, inner_ub, inner_inc)
+ integer res, i, j, inner_lb, inner_ub, inner_inc, outer_lb, outer_ub, outer_inc
+
+ !$omp unroll
+ do i = outer_lb, outer_ub, outer_inc
+ !$omp unroll
+ do j = inner_lb, inner_ub, inner_inc
+ res = i + j
+ end do
+ !$omp end unroll
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic_nested02
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic_nested02(%arg0: !fir.ref<i32> {fir.bindc_name = "outer_lb"}, %arg1: !fir.ref<i32> {fir.bindc_name = "outer_ub"}, %arg2: !fir.ref<i32> {fir.bindc_name = "outer_inc"}, %arg3: !fir.ref<i32> {fir.bindc_name = "inner_lb"}, %arg4: !fir.ref<i32> {fir.bindc_name = "inner_ub"}, %arg5: !fir.ref<i32> {fir.bindc_name = "inner_inc"}) {
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %18 = arith.cmpi slt, %17, %c0_i32 : i32
+!CHECK-NEXT: %19 = arith.subi %c0_i32, %17 : i32
+!CHECK-NEXT: %20 = arith.select %18, %19, %17 : i32
+!CHECK-NEXT: %21 = arith.select %18, %16, %15 : i32
+!CHECK-NEXT: %22 = arith.select %18, %15, %16 : i32
+!CHECK-NEXT: %23 = arith.subi %22, %21 overflow<nuw> : i32
+!CHECK-NEXT: %24 = arith.divui %23, %20 : i32
+!CHECK-NEXT: %25 = arith.addi %24, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %26 = arith.cmpi slt, %22, %21 : i32
+!CHECK-NEXT: %27 = arith.select %26, %c0_i32, %25 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%27) {
+!CHECK-NEXT: %28 = arith.muli %iv, %17 : i32
+!CHECK-NEXT: %29 = arith.addi %15, %28 : i32
+!CHECK-NEXT: hlfir.assign %29 to %14#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %30 = fir.alloca i32 {bindc_name = "j", pinned, uniq_name = "_QFomp_unroll_heuristic_nested02Ej"}
+!CHECK-NEXT: %31:2 = hlfir.declare %30 {uniq_name = "_QFomp_unroll_heuristic_nested02Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK-NEXT: %32 = fir.load %4#0 : !fir.ref<i32>
+!CHECK-NEXT: %33 = fir.load %5#0 : !fir.ref<i32>
+!CHECK-NEXT: %34 = fir.load %3#0 : !fir.ref<i32>
+!CHECK-NEXT: %c0_i32_0 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32_1 = arith.constant 1 : i32
+!CHECK-NEXT: %35 = arith.cmpi slt, %34, %c0_i32_0 : i32
+!CHECK-NEXT: %36 = arith.subi %c0_i32_0, %34 : i32
+!CHECK-NEXT: %37 = arith.select %35, %36, %34 : i32
+!CHECK-NEXT: %38 = arith.select %35, %33, %32 : i32
+!CHECK-NEXT: %39 = arith.select %35, %32, %33 : i32
+!CHECK-NEXT: %40 = arith.subi %39, %38 overflow<nuw> : i32
+!CHECK-NEXT: %41 = arith.divui %40, %37 : i32
+!CHECK-NEXT: %42 = arith.addi %41, %c1_i32_1 overflow<nuw> : i32
+!CHECK-NEXT: %43 = arith.cmpi slt, %39, %38 : i32
+!CHECK-NEXT: %44 = arith.select %43, %c0_i32_0, %42 : i32
+!CHECK-NEXT: %canonloop_s0_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0_s0) %iv_2 : i32 in range(%44) {
+!CHECK-NEXT: %45 = arith.muli %iv_2, %34 : i32
+!CHECK-NEXT: %46 = arith.addi %32, %45 : i32
+!CHECK-NEXT: hlfir.assign %46 to %31#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %47 = fir.load %14#0 : !fir.ref<i32>
+!CHECK-NEXT: %48 = fir.load %31#0 : !fir.ref<i32>
+!CHECK-NEXT: %49 = arith.addi %47, %48 : i32
+!CHECK-NEXT: hlfir.assign %49 to %12#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0_s0)
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Parser/OpenMP/unroll-heuristic.f90 b/flang/test/Parser/OpenMP/unroll-heuristic.f90
new file mode 100644
index 0000000000000..2f589af0c83ca
--- /dev/null
+++ b/flang/test/Parser/OpenMP/unroll-heuristic.f90
@@ -0,0 +1,43 @@
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-unparse | FileCheck --check-prefix=UNPARSE %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-dump-parse-tree | FileCheck --check-prefix=PTREE %s
+
+subroutine openmp_parse_unroll_heuristic
+ integer i
+
+ !$omp unroll
+ do i = 1, 100
+ call func(i)
+ end do
+ !$omp end unroll
+END subroutine openmp_parse_unroll_heuristic
+
+
+!UNPARSE: !$OMP UNROLL
+!UNPARSE-NEXT: DO i=1_4,100_4
+!UNPARSE-NEXT: CALL func(i)
+!UNPARSE-NEXT: END DO
+!UNPARSE-NEXT: !$OMP END UNROLL
+
+!PTREE: OpenMPConstruct -> OpenMPLoopConstruct
+!PTREE-NEXT: | OmpBeginLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
+!PTREE-NEXT: | DoConstruct
+!PTREE-NEXT: | | NonLabelDoStmt
+!PTREE-NEXT: | | | LoopControl -> LoopBounds
+!PTREE-NEXT: | | | | Scalar -> Name = 'i'
+!PTREE-NEXT: | | | | Scalar -> Expr = '1_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!PTREE-NEXT: | | | | Scalar -> Expr = '100_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '100'
+!PTREE-NEXT: | | Block
+!PTREE-NEXT: | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL func(i)'
+!PTREE-NEXT: | | | | | | Call
+!PTREE-NEXT: | | | | | ProcedureDesignator -> Name = 'func'
+!PTREE-NEXT: | | | | | ActualArgSpec
+!PTREE-NEXT: | | | | | | ActualArg -> Expr = 'i'
+!PTREE-NEXT: | | | | | | | Designator -> DataRef -> Name = 'i'
+!PTREE-NEXT: | | EndDoStmt ->
+!PTREE-NEXT: | OmpEndLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
diff --git a/flang/test/Parser/OpenMP/unroll.f90 b/flang/test/Parser/OpenMP/unroll-partial.f90
similarity index 100%
rename from flang/test/Parser/OpenMP/unroll.f90
rename to flang/test/Parser/OpenMP/unroll-partial.f90
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
index f9a85626a3f14..faf820dcfdb29 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
@@ -15,14 +15,10 @@
#ifndef MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
#define MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/BuiltinAttributes.h"
#include "llvm/ADT/SmallVector.h"
-#include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"
-
-#define GET_ATTRDEF_CLASSES
-#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h.inc"
-
#include "mlir/Dialect/OpenMP/OpenMPClauseOps.h.inc"
namespace mlir {
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
index 248ac2eb72c61..0a844fc2380bf 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
@@ -16,6 +16,7 @@
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/AtomicInterfaces.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/OpenACCMPOpsInterfaces.h"
+#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
@@ -24,6 +25,11 @@
#include "mlir/Interfaces/SideEffectInterfaces.h"
#include "llvm/Frontend/OpenMP/OMPDeviceConstants.h"
+namespace mlir::omp {
+/// Find the omp.new_cli, generator, and consumer of a canonical loop info.
+std::tuple<NewCliOp, OpOperand *, OpOperand *> decodeCli(mlir::Value cli);
+} // namespace mlir::omp
+
#define GET_TYPEDEF_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOpsTypes.h.inc"
@@ -33,8 +39,6 @@
#include "mlir/Dialect/OpenMP/OpenMPTypeInterfaces.h.inc"
-#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
-
#define GET_OP_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOps.h.inc"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
index 989ab1710c211..bc9534974d21f 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
@@ -14,6 +14,7 @@
#define MLIR_DIALECT_OPENMP_OPENMPINTERFACES_H_
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
index f3dd44d2c0717..bbcfb87fa03c6 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
@@ -204,4 +204,15 @@ class OpenMP_Op<string mnemonic, list<Trait> traits = [],
let regions = !if(singleRegion, (region AnyRegion:$region), (region));
}
+
+// Base class for OpenMP loop transformations (that either consume or generate
+// loops)
+//
+// Doesn't actually create a C++ base class (only defines default values for
+// tablegen classes that derive from this). Use LoopTransformationInterface
+// instead for common operations.
+class OpenMPTransform_Op<string mnemonic, list<Trait> traits = []> :
+ OpenMP_Op<mnemonic, !listconcat([DeclareOpInterfaceMethods<LoopTransformationInterface>], traits) > {
+}
+
#endif // OPENMP_OP_BASE
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index ac80926053a2d..8641c9b8150ee 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -22,6 +22,7 @@ include "mlir/Dialect/OpenMP/OpenMPOpBase.td"
include "mlir/Interfaces/ControlFlowInterfaces.td"
include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/IR/EnumAttr.td"
+include "mlir/IR/OpAsmInterface.td"
include "mlir/IR/OpBase.td"
include "mlir/IR/SymbolInterfaces.td"
@@ -356,6 +357,212 @@ def SingleOp : OpenMP_Op<"single", traits = [
let hasVerifier = 1;
}
+//===---...
[truncated]
|
@llvm/pr-subscribers-flang-fir-hlfir Author: Michael Kruse (Meinersbur) ChangesAdd support for First step to add
This patch is functional end-to-end and adds support for This is a continuation of #71712 and a cherry-pick of a more comprehensive patch (but taking too long) that I was working on. Currently unroll only works standalone and cannot yet combined with other directive. The following features still have to be implemented:
PR #143715 adds support for the tile construct if it is the only transformation applied to a loop nest before a non-transformation loop-associated construct. It does to by adding a Related:
Patch is 77.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144785.diff 24 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 82673f0948a5b..3a8c7dcb0690a 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2128,6 +2128,161 @@ genLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
return loopOp;
}
+static mlir::omp::CanonicalLoopOp
+genCanonicalLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+ semantics::SemanticsContext &semaCtx,
+ lower::pft::Evaluation &eval, mlir::Location loc,
+ const ConstructQueue &queue,
+ ConstructQueue::const_iterator item,
+ llvm::ArrayRef<const semantics::Symbol *> ivs,
+ llvm::omp::Directive directive, DataSharingProcessor &dsp) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ assert(ivs.size() == 1 && "Nested loops not yet implemented");
+ const semantics::Symbol *iv = ivs[0];
+
+ auto &nestedEval = eval.getFirstNestedEvaluation();
+ if (nestedEval.getIf<parser::DoConstruct>()->IsDoConcurrent()) {
+ TODO(loc, "Do Concurrent in unroll construct");
+ }
+
+ // Get the loop bounds (and increment)
+ auto &doLoopEval = nestedEval.getFirstNestedEvaluation();
+ auto *doStmt = doLoopEval.getIf<parser::NonLabelDoStmt>();
+ assert(doStmt && "Expected do loop to be in the nested evaluation");
+ auto &loopControl = std::get<std::optional<parser::LoopControl>>(doStmt->t);
+ assert(loopControl.has_value());
+ auto *bounds = std::get_if<parser::LoopControl::Bounds>(&loopControl->u);
+ assert(bounds && "Expected bounds for canonical loop");
+ lower::StatementContext stmtCtx;
+ mlir::Value loopLBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->lower), stmtCtx));
+ mlir::Value loopUBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->upper), stmtCtx));
+ mlir::Value loopStepVar = [&]() {
+ if (bounds->step) {
+ return fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->step), stmtCtx));
+ } else {
+ // If `step` is not present, assume it is `1`.
+ return firOpBuilder.createIntegerConstant(loc, firOpBuilder.getI32Type(),
+ 1);
+ }
+ }();
+
+ // Get the integer kind for the loop variable and cast the loop bounds
+ size_t loopVarTypeSize = bounds->name.thing.symbol->GetUltimate().size();
+ mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize);
+ loopLBVar = firOpBuilder.createConvert(loc, loopVarType, loopLBVar);
+ loopUBVar = firOpBuilder.createConvert(loc, loopVarType, loopUBVar);
+ loopStepVar = firOpBuilder.createConvert(loc, loopVarType, loopStepVar);
+
+ // Start lowering
+ mlir::Value zero = firOpBuilder.createIntegerConstant(loc, loopVarType, 0);
+ mlir::Value one = firOpBuilder.createIntegerConstant(loc, loopVarType, 1);
+ mlir::Value isDownwards = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, loopStepVar, zero);
+
+ // Ensure we are counting upwards. If not, negate step and swap lb and ub.
+ mlir::Value negStep =
+ firOpBuilder.create<mlir::arith::SubIOp>(loc, zero, loopStepVar);
+ mlir::Value incr = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, negStep, loopStepVar);
+ mlir::Value lb = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopUBVar, loopLBVar);
+ mlir::Value ub = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopLBVar, loopUBVar);
+
+ // Compute the trip count assuming lb <= ub. This guarantees that the result
+ // is non-negative and we can use unsigned arithmetic.
+ mlir::Value span = firOpBuilder.create<mlir::arith::SubIOp>(
+ loc, ub, lb, ::mlir::arith::IntegerOverflowFlags::nuw);
+ mlir::Value tcMinusOne =
+ firOpBuilder.create<mlir::arith::DivUIOp>(loc, span, incr);
+ mlir::Value tcIfLooping = firOpBuilder.create<mlir::arith::AddIOp>(
+ loc, tcMinusOne, one, ::mlir::arith::IntegerOverflowFlags::nuw);
+
+ // Fall back to 0 if lb > ub
+ mlir::Value isZeroTC = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, ub, lb);
+ mlir::Value tripcount = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isZeroTC, zero, tcIfLooping);
+
+ // Create the CLI handle.
+ auto newcli = firOpBuilder.create<mlir::omp::NewCliOp>(loc);
+ mlir::Value cli = newcli.getResult();
+
+ auto ivCallback = [&](mlir::Operation *op)
+ -> llvm::SmallVector<const Fortran::semantics::Symbol *> {
+ mlir::Region ®ion = op->getRegion(0);
+
+ // Create the op's region skeleton (BB taking the iv as argument)
+ firOpBuilder.createBlock(®ion, {}, {loopVarType}, {loc});
+
+ // Compute the value of the loop variable from the logical iteration number.
+ mlir::Value natIterNum = fir::getBase(region.front().getArgument(0));
+ mlir::Value scaled =
+ firOpBuilder.create<mlir::arith::MulIOp>(loc, natIterNum, loopStepVar);
+ mlir::Value userVal =
+ firOpBuilder.create<mlir::arith::AddIOp>(loc, loopLBVar, scaled);
+
+ // The argument is not currently in memory, so make a temporary for the
+ // argument, and store it there, then bind that location to the argument.
+ mlir::Operation *storeOp =
+ createAndSetPrivatizedLoopVar(converter, loc, userVal, iv);
+
+ firOpBuilder.setInsertionPointAfter(storeOp);
+ return {iv};
+ };
+
+ // Create the omp.canonical_loop operation
+ auto canonLoop = genOpWithBody<mlir::omp::CanonicalLoopOp>(
+ OpWithBodyGenInfo(converter, symTable, semaCtx, loc, nestedEval,
+ directive)
+ .setClauses(&item->clauses)
+ .setDataSharingProcessor(&dsp)
+ .setGenRegionEntryCb(ivCallback),
+ queue, item, tripcount, cli);
+
+ firOpBuilder.setInsertionPointAfter(canonLoop);
+ return canonLoop;
+}
+
+static void genUnrollOp(Fortran::lower::AbstractConverter &converter,
+ Fortran::lower::SymMap &symTable,
+ lower::StatementContext &stmtCtx,
+ Fortran::semantics::SemanticsContext &semaCtx,
+ Fortran::lower::pft::Evaluation &eval,
+ mlir::Location loc, const ConstructQueue &queue,
+ ConstructQueue::const_iterator item) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ mlir::omp::LoopRelatedClauseOps loopInfo;
+ llvm::SmallVector<const semantics::Symbol *> iv;
+ collectLoopRelatedInfo(converter, loc, eval, item->clauses, loopInfo, iv);
+
+ // Clauses for unrolling not yet implemnted
+ ClauseProcessor cp(converter, semaCtx, item->clauses);
+ cp.processTODO<clause::Partial, clause::Full>(
+ loc, llvm::omp::Directive::OMPD_unroll);
+
+ // Even though unroll does not support data-sharing clauses, but this is
+ // required to fill the symbol table.
+ DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
+ /*shouldCollectPreDeterminedSymbols=*/true,
+ /*useDelayedPrivatization=*/false, symTable);
+ dsp.processStep1();
+
+ // Emit the associated loop
+ auto canonLoop =
+ genCanonicalLoopOp(converter, symTable, semaCtx, eval, loc, queue, item,
+ iv, llvm::omp::Directive::OMPD_unroll, dsp);
+
+ // Apply unrolling to it
+ auto cli = canonLoop.getCli();
+ firOpBuilder.create<mlir::omp::UnrollHeuristicOp>(loc, cli);
+}
+
static mlir::omp::MaskedOp
genMaskedOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
lower::StatementContext &stmtCtx,
@@ -3516,12 +3671,9 @@ static void genOMPDispatch(lower::AbstractConverter &converter,
newOp = genTeamsOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue,
item);
break;
- case llvm::omp::Directive::OMPD_tile:
- case llvm::omp::Directive::OMPD_unroll: {
- unsigned version = semaCtx.langOptions().OpenMPVersion;
- TODO(loc, "Unhandled loop directive (" +
- llvm::omp::getOpenMPDirectiveName(dir, version) + ")");
- }
+ case llvm::omp::Directive::OMPD_unroll:
+ genUnrollOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue, item);
+ break;
// case llvm::omp::Directive::OMPD_workdistribute:
case llvm::omp::Directive::OMPD_workshare:
newOp = genWorkshareOp(converter, symTable, stmtCtx, semaCtx, eval, loc,
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic01.f90 b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
new file mode 100644
index 0000000000000..a5f5c003b8a7c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
@@ -0,0 +1,39 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic01(lb, ub, inc)
+ integer res, i, lb, ub, inc
+
+ !$omp unroll
+ do i = lb, ub, inc
+ res = i
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic01
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic01(
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %13 = arith.cmpi slt, %12, %c0_i32 : i32
+!CHECK-NEXT: %14 = arith.subi %c0_i32, %12 : i32
+!CHECK-NEXT: %15 = arith.select %13, %14, %12 : i32
+!CHECK-NEXT: %16 = arith.select %13, %11, %10 : i32
+!CHECK-NEXT: %17 = arith.select %13, %10, %11 : i32
+!CHECK-NEXT: %18 = arith.subi %17, %16 overflow<nuw> : i32
+!CHECK-NEXT: %19 = arith.divui %18, %15 : i32
+!CHECK-NEXT: %20 = arith.addi %19, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %21 = arith.cmpi slt, %17, %16 : i32
+!CHECK-NEXT: %22 = arith.select %21, %c0_i32, %20 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%22) {
+!CHECK-NEXT: %23 = arith.muli %iv, %12 : i32
+!CHECK-NEXT: %24 = arith.addi %10, %23 : i32
+!CHECK-NEXT: hlfir.assign %24 to %9#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %25 = fir.load %9#0 : !fir.ref<i32>
+!CHECK-NEXT: hlfir.assign %25 to %6#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic02.f90 b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
new file mode 100644
index 0000000000000..669f185f910c4
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
@@ -0,0 +1,70 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic_nested02(outer_lb, outer_ub, outer_inc, inner_lb, inner_ub, inner_inc)
+ integer res, i, j, inner_lb, inner_ub, inner_inc, outer_lb, outer_ub, outer_inc
+
+ !$omp unroll
+ do i = outer_lb, outer_ub, outer_inc
+ !$omp unroll
+ do j = inner_lb, inner_ub, inner_inc
+ res = i + j
+ end do
+ !$omp end unroll
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic_nested02
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic_nested02(%arg0: !fir.ref<i32> {fir.bindc_name = "outer_lb"}, %arg1: !fir.ref<i32> {fir.bindc_name = "outer_ub"}, %arg2: !fir.ref<i32> {fir.bindc_name = "outer_inc"}, %arg3: !fir.ref<i32> {fir.bindc_name = "inner_lb"}, %arg4: !fir.ref<i32> {fir.bindc_name = "inner_ub"}, %arg5: !fir.ref<i32> {fir.bindc_name = "inner_inc"}) {
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %18 = arith.cmpi slt, %17, %c0_i32 : i32
+!CHECK-NEXT: %19 = arith.subi %c0_i32, %17 : i32
+!CHECK-NEXT: %20 = arith.select %18, %19, %17 : i32
+!CHECK-NEXT: %21 = arith.select %18, %16, %15 : i32
+!CHECK-NEXT: %22 = arith.select %18, %15, %16 : i32
+!CHECK-NEXT: %23 = arith.subi %22, %21 overflow<nuw> : i32
+!CHECK-NEXT: %24 = arith.divui %23, %20 : i32
+!CHECK-NEXT: %25 = arith.addi %24, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %26 = arith.cmpi slt, %22, %21 : i32
+!CHECK-NEXT: %27 = arith.select %26, %c0_i32, %25 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%27) {
+!CHECK-NEXT: %28 = arith.muli %iv, %17 : i32
+!CHECK-NEXT: %29 = arith.addi %15, %28 : i32
+!CHECK-NEXT: hlfir.assign %29 to %14#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %30 = fir.alloca i32 {bindc_name = "j", pinned, uniq_name = "_QFomp_unroll_heuristic_nested02Ej"}
+!CHECK-NEXT: %31:2 = hlfir.declare %30 {uniq_name = "_QFomp_unroll_heuristic_nested02Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK-NEXT: %32 = fir.load %4#0 : !fir.ref<i32>
+!CHECK-NEXT: %33 = fir.load %5#0 : !fir.ref<i32>
+!CHECK-NEXT: %34 = fir.load %3#0 : !fir.ref<i32>
+!CHECK-NEXT: %c0_i32_0 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32_1 = arith.constant 1 : i32
+!CHECK-NEXT: %35 = arith.cmpi slt, %34, %c0_i32_0 : i32
+!CHECK-NEXT: %36 = arith.subi %c0_i32_0, %34 : i32
+!CHECK-NEXT: %37 = arith.select %35, %36, %34 : i32
+!CHECK-NEXT: %38 = arith.select %35, %33, %32 : i32
+!CHECK-NEXT: %39 = arith.select %35, %32, %33 : i32
+!CHECK-NEXT: %40 = arith.subi %39, %38 overflow<nuw> : i32
+!CHECK-NEXT: %41 = arith.divui %40, %37 : i32
+!CHECK-NEXT: %42 = arith.addi %41, %c1_i32_1 overflow<nuw> : i32
+!CHECK-NEXT: %43 = arith.cmpi slt, %39, %38 : i32
+!CHECK-NEXT: %44 = arith.select %43, %c0_i32_0, %42 : i32
+!CHECK-NEXT: %canonloop_s0_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0_s0) %iv_2 : i32 in range(%44) {
+!CHECK-NEXT: %45 = arith.muli %iv_2, %34 : i32
+!CHECK-NEXT: %46 = arith.addi %32, %45 : i32
+!CHECK-NEXT: hlfir.assign %46 to %31#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %47 = fir.load %14#0 : !fir.ref<i32>
+!CHECK-NEXT: %48 = fir.load %31#0 : !fir.ref<i32>
+!CHECK-NEXT: %49 = arith.addi %47, %48 : i32
+!CHECK-NEXT: hlfir.assign %49 to %12#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0_s0)
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Parser/OpenMP/unroll-heuristic.f90 b/flang/test/Parser/OpenMP/unroll-heuristic.f90
new file mode 100644
index 0000000000000..2f589af0c83ca
--- /dev/null
+++ b/flang/test/Parser/OpenMP/unroll-heuristic.f90
@@ -0,0 +1,43 @@
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-unparse | FileCheck --check-prefix=UNPARSE %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-dump-parse-tree | FileCheck --check-prefix=PTREE %s
+
+subroutine openmp_parse_unroll_heuristic
+ integer i
+
+ !$omp unroll
+ do i = 1, 100
+ call func(i)
+ end do
+ !$omp end unroll
+END subroutine openmp_parse_unroll_heuristic
+
+
+!UNPARSE: !$OMP UNROLL
+!UNPARSE-NEXT: DO i=1_4,100_4
+!UNPARSE-NEXT: CALL func(i)
+!UNPARSE-NEXT: END DO
+!UNPARSE-NEXT: !$OMP END UNROLL
+
+!PTREE: OpenMPConstruct -> OpenMPLoopConstruct
+!PTREE-NEXT: | OmpBeginLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
+!PTREE-NEXT: | DoConstruct
+!PTREE-NEXT: | | NonLabelDoStmt
+!PTREE-NEXT: | | | LoopControl -> LoopBounds
+!PTREE-NEXT: | | | | Scalar -> Name = 'i'
+!PTREE-NEXT: | | | | Scalar -> Expr = '1_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!PTREE-NEXT: | | | | Scalar -> Expr = '100_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '100'
+!PTREE-NEXT: | | Block
+!PTREE-NEXT: | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL func(i)'
+!PTREE-NEXT: | | | | | | Call
+!PTREE-NEXT: | | | | | ProcedureDesignator -> Name = 'func'
+!PTREE-NEXT: | | | | | ActualArgSpec
+!PTREE-NEXT: | | | | | | ActualArg -> Expr = 'i'
+!PTREE-NEXT: | | | | | | | Designator -> DataRef -> Name = 'i'
+!PTREE-NEXT: | | EndDoStmt ->
+!PTREE-NEXT: | OmpEndLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
diff --git a/flang/test/Parser/OpenMP/unroll.f90 b/flang/test/Parser/OpenMP/unroll-partial.f90
similarity index 100%
rename from flang/test/Parser/OpenMP/unroll.f90
rename to flang/test/Parser/OpenMP/unroll-partial.f90
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
index f9a85626a3f14..faf820dcfdb29 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
@@ -15,14 +15,10 @@
#ifndef MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
#define MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/BuiltinAttributes.h"
#include "llvm/ADT/SmallVector.h"
-#include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"
-
-#define GET_ATTRDEF_CLASSES
-#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h.inc"
-
#include "mlir/Dialect/OpenMP/OpenMPClauseOps.h.inc"
namespace mlir {
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
index 248ac2eb72c61..0a844fc2380bf 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
@@ -16,6 +16,7 @@
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/AtomicInterfaces.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/OpenACCMPOpsInterfaces.h"
+#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
@@ -24,6 +25,11 @@
#include "mlir/Interfaces/SideEffectInterfaces.h"
#include "llvm/Frontend/OpenMP/OMPDeviceConstants.h"
+namespace mlir::omp {
+/// Find the omp.new_cli, generator, and consumer of a canonical loop info.
+std::tuple<NewCliOp, OpOperand *, OpOperand *> decodeCli(mlir::Value cli);
+} // namespace mlir::omp
+
#define GET_TYPEDEF_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOpsTypes.h.inc"
@@ -33,8 +39,6 @@
#include "mlir/Dialect/OpenMP/OpenMPTypeInterfaces.h.inc"
-#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
-
#define GET_OP_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOps.h.inc"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
index 989ab1710c211..bc9534974d21f 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
@@ -14,6 +14,7 @@
#define MLIR_DIALECT_OPENMP_OPENMPINTERFACES_H_
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
index f3dd44d2c0717..bbcfb87fa03c6 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
@@ -204,4 +204,15 @@ class OpenMP_Op<string mnemonic, list<Trait> traits = [],
let regions = !if(singleRegion, (region AnyRegion:$region), (region));
}
+
+// Base class for OpenMP loop transformations (that either consume or generate
+// loops)
+//
+// Doesn't actually create a C++ base class (only defines default values for
+// tablegen classes that derive from this). Use LoopTransformationInterface
+// instead for common operations.
+class OpenMPTransform_Op<string mnemonic, list<Trait> traits = []> :
+ OpenMP_Op<mnemonic, !listconcat([DeclareOpInterfaceMethods<LoopTransformationInterface>], traits) > {
+}
+
#endif // OPENMP_OP_BASE
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index ac80926053a2d..8641c9b8150ee 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -22,6 +22,7 @@ include "mlir/Dialect/OpenMP/OpenMPOpBase.td"
include "mlir/Interfaces/ControlFlowInterfaces.td"
include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/IR/EnumAttr.td"
+include "mlir/IR/OpAsmInterface.td"
include "mlir/IR/OpBase.td"
include "mlir/IR/SymbolInterfaces.td"
@@ -356,6 +357,212 @@ def SingleOp : OpenMP_Op<"single", traits = [
let hasVerifier = 1;
}
+//===---...
[truncated]
|
@llvm/pr-subscribers-mlir-openmp Author: Michael Kruse (Meinersbur) ChangesAdd support for First step to add
This patch is functional end-to-end and adds support for This is a continuation of #71712 and a cherry-pick of a more comprehensive patch (but taking too long) that I was working on. Currently unroll only works standalone and cannot yet combined with other directive. The following features still have to be implemented:
PR #143715 adds support for the tile construct if it is the only transformation applied to a loop nest before a non-transformation loop-associated construct. It does to by adding a Related:
Patch is 77.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144785.diff 24 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 82673f0948a5b..3a8c7dcb0690a 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2128,6 +2128,161 @@ genLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
return loopOp;
}
+static mlir::omp::CanonicalLoopOp
+genCanonicalLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+ semantics::SemanticsContext &semaCtx,
+ lower::pft::Evaluation &eval, mlir::Location loc,
+ const ConstructQueue &queue,
+ ConstructQueue::const_iterator item,
+ llvm::ArrayRef<const semantics::Symbol *> ivs,
+ llvm::omp::Directive directive, DataSharingProcessor &dsp) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ assert(ivs.size() == 1 && "Nested loops not yet implemented");
+ const semantics::Symbol *iv = ivs[0];
+
+ auto &nestedEval = eval.getFirstNestedEvaluation();
+ if (nestedEval.getIf<parser::DoConstruct>()->IsDoConcurrent()) {
+ TODO(loc, "Do Concurrent in unroll construct");
+ }
+
+ // Get the loop bounds (and increment)
+ auto &doLoopEval = nestedEval.getFirstNestedEvaluation();
+ auto *doStmt = doLoopEval.getIf<parser::NonLabelDoStmt>();
+ assert(doStmt && "Expected do loop to be in the nested evaluation");
+ auto &loopControl = std::get<std::optional<parser::LoopControl>>(doStmt->t);
+ assert(loopControl.has_value());
+ auto *bounds = std::get_if<parser::LoopControl::Bounds>(&loopControl->u);
+ assert(bounds && "Expected bounds for canonical loop");
+ lower::StatementContext stmtCtx;
+ mlir::Value loopLBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->lower), stmtCtx));
+ mlir::Value loopUBVar = fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->upper), stmtCtx));
+ mlir::Value loopStepVar = [&]() {
+ if (bounds->step) {
+ return fir::getBase(
+ converter.genExprValue(*semantics::GetExpr(bounds->step), stmtCtx));
+ } else {
+ // If `step` is not present, assume it is `1`.
+ return firOpBuilder.createIntegerConstant(loc, firOpBuilder.getI32Type(),
+ 1);
+ }
+ }();
+
+ // Get the integer kind for the loop variable and cast the loop bounds
+ size_t loopVarTypeSize = bounds->name.thing.symbol->GetUltimate().size();
+ mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize);
+ loopLBVar = firOpBuilder.createConvert(loc, loopVarType, loopLBVar);
+ loopUBVar = firOpBuilder.createConvert(loc, loopVarType, loopUBVar);
+ loopStepVar = firOpBuilder.createConvert(loc, loopVarType, loopStepVar);
+
+ // Start lowering
+ mlir::Value zero = firOpBuilder.createIntegerConstant(loc, loopVarType, 0);
+ mlir::Value one = firOpBuilder.createIntegerConstant(loc, loopVarType, 1);
+ mlir::Value isDownwards = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, loopStepVar, zero);
+
+ // Ensure we are counting upwards. If not, negate step and swap lb and ub.
+ mlir::Value negStep =
+ firOpBuilder.create<mlir::arith::SubIOp>(loc, zero, loopStepVar);
+ mlir::Value incr = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, negStep, loopStepVar);
+ mlir::Value lb = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopUBVar, loopLBVar);
+ mlir::Value ub = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isDownwards, loopLBVar, loopUBVar);
+
+ // Compute the trip count assuming lb <= ub. This guarantees that the result
+ // is non-negative and we can use unsigned arithmetic.
+ mlir::Value span = firOpBuilder.create<mlir::arith::SubIOp>(
+ loc, ub, lb, ::mlir::arith::IntegerOverflowFlags::nuw);
+ mlir::Value tcMinusOne =
+ firOpBuilder.create<mlir::arith::DivUIOp>(loc, span, incr);
+ mlir::Value tcIfLooping = firOpBuilder.create<mlir::arith::AddIOp>(
+ loc, tcMinusOne, one, ::mlir::arith::IntegerOverflowFlags::nuw);
+
+ // Fall back to 0 if lb > ub
+ mlir::Value isZeroTC = firOpBuilder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::slt, ub, lb);
+ mlir::Value tripcount = firOpBuilder.create<mlir::arith::SelectOp>(
+ loc, isZeroTC, zero, tcIfLooping);
+
+ // Create the CLI handle.
+ auto newcli = firOpBuilder.create<mlir::omp::NewCliOp>(loc);
+ mlir::Value cli = newcli.getResult();
+
+ auto ivCallback = [&](mlir::Operation *op)
+ -> llvm::SmallVector<const Fortran::semantics::Symbol *> {
+ mlir::Region ®ion = op->getRegion(0);
+
+ // Create the op's region skeleton (BB taking the iv as argument)
+ firOpBuilder.createBlock(®ion, {}, {loopVarType}, {loc});
+
+ // Compute the value of the loop variable from the logical iteration number.
+ mlir::Value natIterNum = fir::getBase(region.front().getArgument(0));
+ mlir::Value scaled =
+ firOpBuilder.create<mlir::arith::MulIOp>(loc, natIterNum, loopStepVar);
+ mlir::Value userVal =
+ firOpBuilder.create<mlir::arith::AddIOp>(loc, loopLBVar, scaled);
+
+ // The argument is not currently in memory, so make a temporary for the
+ // argument, and store it there, then bind that location to the argument.
+ mlir::Operation *storeOp =
+ createAndSetPrivatizedLoopVar(converter, loc, userVal, iv);
+
+ firOpBuilder.setInsertionPointAfter(storeOp);
+ return {iv};
+ };
+
+ // Create the omp.canonical_loop operation
+ auto canonLoop = genOpWithBody<mlir::omp::CanonicalLoopOp>(
+ OpWithBodyGenInfo(converter, symTable, semaCtx, loc, nestedEval,
+ directive)
+ .setClauses(&item->clauses)
+ .setDataSharingProcessor(&dsp)
+ .setGenRegionEntryCb(ivCallback),
+ queue, item, tripcount, cli);
+
+ firOpBuilder.setInsertionPointAfter(canonLoop);
+ return canonLoop;
+}
+
+static void genUnrollOp(Fortran::lower::AbstractConverter &converter,
+ Fortran::lower::SymMap &symTable,
+ lower::StatementContext &stmtCtx,
+ Fortran::semantics::SemanticsContext &semaCtx,
+ Fortran::lower::pft::Evaluation &eval,
+ mlir::Location loc, const ConstructQueue &queue,
+ ConstructQueue::const_iterator item) {
+ fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+ mlir::omp::LoopRelatedClauseOps loopInfo;
+ llvm::SmallVector<const semantics::Symbol *> iv;
+ collectLoopRelatedInfo(converter, loc, eval, item->clauses, loopInfo, iv);
+
+ // Clauses for unrolling not yet implemnted
+ ClauseProcessor cp(converter, semaCtx, item->clauses);
+ cp.processTODO<clause::Partial, clause::Full>(
+ loc, llvm::omp::Directive::OMPD_unroll);
+
+ // Even though unroll does not support data-sharing clauses, but this is
+ // required to fill the symbol table.
+ DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
+ /*shouldCollectPreDeterminedSymbols=*/true,
+ /*useDelayedPrivatization=*/false, symTable);
+ dsp.processStep1();
+
+ // Emit the associated loop
+ auto canonLoop =
+ genCanonicalLoopOp(converter, symTable, semaCtx, eval, loc, queue, item,
+ iv, llvm::omp::Directive::OMPD_unroll, dsp);
+
+ // Apply unrolling to it
+ auto cli = canonLoop.getCli();
+ firOpBuilder.create<mlir::omp::UnrollHeuristicOp>(loc, cli);
+}
+
static mlir::omp::MaskedOp
genMaskedOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
lower::StatementContext &stmtCtx,
@@ -3516,12 +3671,9 @@ static void genOMPDispatch(lower::AbstractConverter &converter,
newOp = genTeamsOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue,
item);
break;
- case llvm::omp::Directive::OMPD_tile:
- case llvm::omp::Directive::OMPD_unroll: {
- unsigned version = semaCtx.langOptions().OpenMPVersion;
- TODO(loc, "Unhandled loop directive (" +
- llvm::omp::getOpenMPDirectiveName(dir, version) + ")");
- }
+ case llvm::omp::Directive::OMPD_unroll:
+ genUnrollOp(converter, symTable, stmtCtx, semaCtx, eval, loc, queue, item);
+ break;
// case llvm::omp::Directive::OMPD_workdistribute:
case llvm::omp::Directive::OMPD_workshare:
newOp = genWorkshareOp(converter, symTable, stmtCtx, semaCtx, eval, loc,
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic01.f90 b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
new file mode 100644
index 0000000000000..a5f5c003b8a7c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic01.f90
@@ -0,0 +1,39 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic01(lb, ub, inc)
+ integer res, i, lb, ub, inc
+
+ !$omp unroll
+ do i = lb, ub, inc
+ res = i
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic01
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic01(
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %13 = arith.cmpi slt, %12, %c0_i32 : i32
+!CHECK-NEXT: %14 = arith.subi %c0_i32, %12 : i32
+!CHECK-NEXT: %15 = arith.select %13, %14, %12 : i32
+!CHECK-NEXT: %16 = arith.select %13, %11, %10 : i32
+!CHECK-NEXT: %17 = arith.select %13, %10, %11 : i32
+!CHECK-NEXT: %18 = arith.subi %17, %16 overflow<nuw> : i32
+!CHECK-NEXT: %19 = arith.divui %18, %15 : i32
+!CHECK-NEXT: %20 = arith.addi %19, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %21 = arith.cmpi slt, %17, %16 : i32
+!CHECK-NEXT: %22 = arith.select %21, %c0_i32, %20 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%22) {
+!CHECK-NEXT: %23 = arith.muli %iv, %12 : i32
+!CHECK-NEXT: %24 = arith.addi %10, %23 : i32
+!CHECK-NEXT: hlfir.assign %24 to %9#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %25 = fir.load %9#0 : !fir.ref<i32>
+!CHECK-NEXT: hlfir.assign %25 to %6#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Lower/OpenMP/unroll-heuristic02.f90 b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
new file mode 100644
index 0000000000000..669f185f910c4
--- /dev/null
+++ b/flang/test/Lower/OpenMP/unroll-heuristic02.f90
@@ -0,0 +1,70 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=51 -o - %s 2>&1 | FileCheck %s
+
+
+subroutine omp_unroll_heuristic_nested02(outer_lb, outer_ub, outer_inc, inner_lb, inner_ub, inner_inc)
+ integer res, i, j, inner_lb, inner_ub, inner_inc, outer_lb, outer_ub, outer_inc
+
+ !$omp unroll
+ do i = outer_lb, outer_ub, outer_inc
+ !$omp unroll
+ do j = inner_lb, inner_ub, inner_inc
+ res = i + j
+ end do
+ !$omp end unroll
+ end do
+ !$omp end unroll
+
+end subroutine omp_unroll_heuristic_nested02
+
+
+!CHECK-LABEL: func.func @_QPomp_unroll_heuristic_nested02(%arg0: !fir.ref<i32> {fir.bindc_name = "outer_lb"}, %arg1: !fir.ref<i32> {fir.bindc_name = "outer_ub"}, %arg2: !fir.ref<i32> {fir.bindc_name = "outer_inc"}, %arg3: !fir.ref<i32> {fir.bindc_name = "inner_lb"}, %arg4: !fir.ref<i32> {fir.bindc_name = "inner_ub"}, %arg5: !fir.ref<i32> {fir.bindc_name = "inner_inc"}) {
+!CHECK: %c0_i32 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32 = arith.constant 1 : i32
+!CHECK-NEXT: %18 = arith.cmpi slt, %17, %c0_i32 : i32
+!CHECK-NEXT: %19 = arith.subi %c0_i32, %17 : i32
+!CHECK-NEXT: %20 = arith.select %18, %19, %17 : i32
+!CHECK-NEXT: %21 = arith.select %18, %16, %15 : i32
+!CHECK-NEXT: %22 = arith.select %18, %15, %16 : i32
+!CHECK-NEXT: %23 = arith.subi %22, %21 overflow<nuw> : i32
+!CHECK-NEXT: %24 = arith.divui %23, %20 : i32
+!CHECK-NEXT: %25 = arith.addi %24, %c1_i32 overflow<nuw> : i32
+!CHECK-NEXT: %26 = arith.cmpi slt, %22, %21 : i32
+!CHECK-NEXT: %27 = arith.select %26, %c0_i32, %25 : i32
+!CHECK-NEXT: %canonloop_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0) %iv : i32 in range(%27) {
+!CHECK-NEXT: %28 = arith.muli %iv, %17 : i32
+!CHECK-NEXT: %29 = arith.addi %15, %28 : i32
+!CHECK-NEXT: hlfir.assign %29 to %14#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %30 = fir.alloca i32 {bindc_name = "j", pinned, uniq_name = "_QFomp_unroll_heuristic_nested02Ej"}
+!CHECK-NEXT: %31:2 = hlfir.declare %30 {uniq_name = "_QFomp_unroll_heuristic_nested02Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK-NEXT: %32 = fir.load %4#0 : !fir.ref<i32>
+!CHECK-NEXT: %33 = fir.load %5#0 : !fir.ref<i32>
+!CHECK-NEXT: %34 = fir.load %3#0 : !fir.ref<i32>
+!CHECK-NEXT: %c0_i32_0 = arith.constant 0 : i32
+!CHECK-NEXT: %c1_i32_1 = arith.constant 1 : i32
+!CHECK-NEXT: %35 = arith.cmpi slt, %34, %c0_i32_0 : i32
+!CHECK-NEXT: %36 = arith.subi %c0_i32_0, %34 : i32
+!CHECK-NEXT: %37 = arith.select %35, %36, %34 : i32
+!CHECK-NEXT: %38 = arith.select %35, %33, %32 : i32
+!CHECK-NEXT: %39 = arith.select %35, %32, %33 : i32
+!CHECK-NEXT: %40 = arith.subi %39, %38 overflow<nuw> : i32
+!CHECK-NEXT: %41 = arith.divui %40, %37 : i32
+!CHECK-NEXT: %42 = arith.addi %41, %c1_i32_1 overflow<nuw> : i32
+!CHECK-NEXT: %43 = arith.cmpi slt, %39, %38 : i32
+!CHECK-NEXT: %44 = arith.select %43, %c0_i32_0, %42 : i32
+!CHECK-NEXT: %canonloop_s0_s0 = omp.new_cli
+!CHECK-NEXT: omp.canonical_loop(%canonloop_s0_s0) %iv_2 : i32 in range(%44) {
+!CHECK-NEXT: %45 = arith.muli %iv_2, %34 : i32
+!CHECK-NEXT: %46 = arith.addi %32, %45 : i32
+!CHECK-NEXT: hlfir.assign %46 to %31#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: %47 = fir.load %14#0 : !fir.ref<i32>
+!CHECK-NEXT: %48 = fir.load %31#0 : !fir.ref<i32>
+!CHECK-NEXT: %49 = arith.addi %47, %48 : i32
+!CHECK-NEXT: hlfir.assign %49 to %12#0 : i32, !fir.ref<i32>
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0_s0)
+!CHECK-NEXT: omp.terminator
+!CHECK-NEXT: }
+!CHECK-NEXT: omp.unroll_heuristic(%canonloop_s0)
+!CHECK-NEXT: return
diff --git a/flang/test/Parser/OpenMP/unroll-heuristic.f90 b/flang/test/Parser/OpenMP/unroll-heuristic.f90
new file mode 100644
index 0000000000000..2f589af0c83ca
--- /dev/null
+++ b/flang/test/Parser/OpenMP/unroll-heuristic.f90
@@ -0,0 +1,43 @@
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-unparse | FileCheck --check-prefix=UNPARSE %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=51 %s -fdebug-dump-parse-tree | FileCheck --check-prefix=PTREE %s
+
+subroutine openmp_parse_unroll_heuristic
+ integer i
+
+ !$omp unroll
+ do i = 1, 100
+ call func(i)
+ end do
+ !$omp end unroll
+END subroutine openmp_parse_unroll_heuristic
+
+
+!UNPARSE: !$OMP UNROLL
+!UNPARSE-NEXT: DO i=1_4,100_4
+!UNPARSE-NEXT: CALL func(i)
+!UNPARSE-NEXT: END DO
+!UNPARSE-NEXT: !$OMP END UNROLL
+
+!PTREE: OpenMPConstruct -> OpenMPLoopConstruct
+!PTREE-NEXT: | OmpBeginLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
+!PTREE-NEXT: | DoConstruct
+!PTREE-NEXT: | | NonLabelDoStmt
+!PTREE-NEXT: | | | LoopControl -> LoopBounds
+!PTREE-NEXT: | | | | Scalar -> Name = 'i'
+!PTREE-NEXT: | | | | Scalar -> Expr = '1_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '1'
+!PTREE-NEXT: | | | | Scalar -> Expr = '100_4'
+!PTREE-NEXT: | | | | | LiteralConstant -> IntLiteralConstant = '100'
+!PTREE-NEXT: | | Block
+!PTREE-NEXT: | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> CallStmt = 'CALL func(i)'
+!PTREE-NEXT: | | | | | | Call
+!PTREE-NEXT: | | | | | ProcedureDesignator -> Name = 'func'
+!PTREE-NEXT: | | | | | ActualArgSpec
+!PTREE-NEXT: | | | | | | ActualArg -> Expr = 'i'
+!PTREE-NEXT: | | | | | | | Designator -> DataRef -> Name = 'i'
+!PTREE-NEXT: | | EndDoStmt ->
+!PTREE-NEXT: | OmpEndLoopDirective
+!PTREE-NEXT: | | OmpLoopDirective -> llvm::omp::Directive = unroll
+!PTREE-NEXT: | | OmpClauseList ->
diff --git a/flang/test/Parser/OpenMP/unroll.f90 b/flang/test/Parser/OpenMP/unroll-partial.f90
similarity index 100%
rename from flang/test/Parser/OpenMP/unroll.f90
rename to flang/test/Parser/OpenMP/unroll-partial.f90
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
index f9a85626a3f14..faf820dcfdb29 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
@@ -15,14 +15,10 @@
#ifndef MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
#define MLIR_DIALECT_OPENMP_OPENMPCLAUSEOPERANDS_H_
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/BuiltinAttributes.h"
#include "llvm/ADT/SmallVector.h"
-#include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"
-
-#define GET_ATTRDEF_CLASSES
-#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h.inc"
-
#include "mlir/Dialect/OpenMP/OpenMPClauseOps.h.inc"
namespace mlir {
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
index 248ac2eb72c61..0a844fc2380bf 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h
@@ -16,6 +16,7 @@
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/AtomicInterfaces.h"
#include "mlir/Dialect/OpenACCMPCommon/Interfaces/OpenACCMPOpsInterfaces.h"
+#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
@@ -24,6 +25,11 @@
#include "mlir/Interfaces/SideEffectInterfaces.h"
#include "llvm/Frontend/OpenMP/OMPDeviceConstants.h"
+namespace mlir::omp {
+/// Find the omp.new_cli, generator, and consumer of a canonical loop info.
+std::tuple<NewCliOp, OpOperand *, OpOperand *> decodeCli(mlir::Value cli);
+} // namespace mlir::omp
+
#define GET_TYPEDEF_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOpsTypes.h.inc"
@@ -33,8 +39,6 @@
#include "mlir/Dialect/OpenMP/OpenMPTypeInterfaces.h.inc"
-#include "mlir/Dialect/OpenMP/OpenMPInterfaces.h"
-
#define GET_OP_CLASSES
#include "mlir/Dialect/OpenMP/OpenMPOps.h.inc"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
index 989ab1710c211..bc9534974d21f 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPInterfaces.h
@@ -14,6 +14,7 @@
#define MLIR_DIALECT_OPENMP_OPENMPINTERFACES_H_
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/OpenMP/OpenMPOpsAttributes.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
index f3dd44d2c0717..bbcfb87fa03c6 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td
@@ -204,4 +204,15 @@ class OpenMP_Op<string mnemonic, list<Trait> traits = [],
let regions = !if(singleRegion, (region AnyRegion:$region), (region));
}
+
+// Base class for OpenMP loop transformations (that either consume or generate
+// loops)
+//
+// Doesn't actually create a C++ base class (only defines default values for
+// tablegen classes that derive from this). Use LoopTransformationInterface
+// instead for common operations.
+class OpenMPTransform_Op<string mnemonic, list<Trait> traits = []> :
+ OpenMP_Op<mnemonic, !listconcat([DeclareOpInterfaceMethods<LoopTransformationInterface>], traits) > {
+}
+
#endif // OPENMP_OP_BASE
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index ac80926053a2d..8641c9b8150ee 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -22,6 +22,7 @@ include "mlir/Dialect/OpenMP/OpenMPOpBase.td"
include "mlir/Interfaces/ControlFlowInterfaces.td"
include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/IR/EnumAttr.td"
+include "mlir/IR/OpAsmInterface.td"
include "mlir/IR/OpBase.td"
include "mlir/IR/SymbolInterfaces.td"
@@ -356,6 +357,212 @@ def SingleOp : OpenMP_Op<"single", traits = [
let hasVerifier = 1;
}
+//===---...
[truncated]
|
Thank you Michael, nice to see this moving forward! Could you split this PR, to make it more accessible to review? Maybe one PR for generic MLIR infrastructure additions, one for OpenMP MLIR tablegen definitions / headers, one for MLIR to LLVM IR translation and another for Flang lowering changes, for example? One thing I'm thinking is that we'd probably want to put canonical loop support in Flang under an "experimental" compiler option, to toggle between the existing So, I think it'd be nice to pass a flag to be used in Flang lowering to select either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the big patch!
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Meinersbur! Took awhile but finally went over the whole PR! Some small comments but I will leave approval to people who have been involved in canonical loop discussions. Learned quite a bit from the PR though.
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
if (nestedEval.getIf<parser::DoConstruct>()->IsDoConcurrent()) { | ||
TODO(loc, "Do Concurrent in unroll construct"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to genUnrollOp
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenMP 6.0 specifies DO CONCURRENT only with the !omp loop
construct. At this stage will not support it for any omp.canonical_loop
. Changed the TODO message.
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
if (bounds->step) { | ||
return fir::getBase( | ||
converter.genExprValue(*semantics::GetExpr(bounds->step), stmtCtx)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for the else
here.
mlir::omp::LoopRelatedClauseOps loopInfo; | ||
llvm::SmallVector<const semantics::Symbol *> iv; | ||
collectLoopRelatedInfo(converter, loc, eval, item->clauses, loopInfo, iv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to genCanonicalLoopOp
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of loops to collect info for depends on the construct (e.g. unroll: 1, tile: number of elements in the sizes
clause). This is not known by genCanonicalLoopOp
. One could add it as a paramter to it, but since collected loop info will eventually also contain generated loops from other constructs, there will probably be a utility function for that that calls collectLoopRelatedInfo and followed by genCanonicalLoopOp, or uses an generated loop and for all returns an array of "semantic loop" references of CanonicalLoopInfo values for each expected loop. At this point it is too much speculation on how the final result will look like.
|
||
|
||
!CHECK-LABEL: func.func @_QPomp_unroll_heuristic01( | ||
!CHECK: %c0_i32 = arith.constant 0 : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you capture SSA value names using descriptive names (e.g. trip_count
, ub
, lb
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not possible in MLIR: SSA value names are always automatically generated and cannot be user-provided. In this case there is a special handler for arith.constant
that generates the name out of c
(for constant), the value of the constant (0
), and the type (i32
). Without such a special handler, an SSA name just gets a sequential number id. For values of type omp.cli
I also defined such a special handler that generates names based on the nesting, see NewCliOp::getAsmResultNames
.
let description = [{ | ||
All loops that conform to OpenMP's definition of a canonical loop can be | ||
simplified to a CanonicalLoopOp. In particular, there are no loop-carried | ||
variables and the number of iterations it will execute is know before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variables and the number of iterations it will execute is know before the | |
variables and the number of iterations it will execute is known before the |
if (generatees.first <= opnum && | ||
opnum < generatees.first + generatees.second) { | ||
assert(!gen && "Each CLI may have at most one consumer"); | ||
gen = &use; | ||
} else if (applyees.first <= opnum && | ||
opnum < applyees.first + applyees.second) { | ||
assert(!cons && "Each CLI may have at most one def"); | ||
cons = &use; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add isApplyee(unsigned operandIdx)
and isGeneratee(unsigned operandIdx)
to LoopTransformationInterface
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Currently the generated `.h.inc` files are included coarse-grained header files that just happen to work in the current source. But if at another header, call it `A.h`, a definition from e.g. "OpenMPOpsEnums.h.inc" is needed, it cannot be included there because it `A.h` is also included in `B.h` that uses `OpenMPClauseOperands.h` which already includes `OpenMPOpsEnums.h.inc`. So the content of `OpenMPOpsEnums.h.inc` appears twice in the translation unit result in a compile failure because the same enum cannot be defined twice. This patch tries to use more fine-grained include header for generated content protected by header guards, so `OpenMPOpsEnums.h` can be included when ever needed. Some is done with `OpenMPOpsAttributes.h`. Needed for #144785. Also fixes the recursive #include of `OpenMPDialect.h` and `OpenMPInterfaces.h`. Patch extracted out of #144785
…troduce isApplyee/isGeneratee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me with one nit. Please wait for another approval.
llvm::omp::Directive directive, DataSharingProcessor &dsp) { | ||
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); | ||
|
||
assert(ivs.size() == 1 && "Nested loops not yet implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are nested loops already caught elsewhere? Otherwise a TODO message might be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is currently only called by genUnrollOp which only supports a single loop, and can only call genCanonicalLoopOp
with a single loop. Anything that would use genCanonicalLoopOp for nests with multiple loops with need to update the implementation: Probably a genCanonicalLoopNest
that calls genCanonicalLoopOp
multiple times, once for each loop. That is, calling genCanonicalLoopOp
with more than one iv would be a API usage error1, not a "not yet implemented".
Footnotes
-
Probably
genCanonicalLoopNest
gets theArrayRef
and passes each matchingsemantics::Symbol *
to a call ofgenCanonicalLoopOp
, we will see. ↩
Add the supporting OpenMP Dialect operations, types, and interfaces for modelling MLIR Operations: * omp.newcli * omp.canonical_loop MLIR Types: * !omp.cli MLIR Interfaces: * LoopTransformationInterface As a first loop transformations to be able to use these new operation in follow-up PRs (#144785) * omp.unroll_heuristic
Add support for
!$omp unroll
in Flang and basic MLIRomp.canonical_loop
modeling.First step to add
omp.canonical_loop
modeling to the MLIR OpenMP dialect with the goal of being more general than the currentomp.loop_nest
approach:This patch is functional end-to-end and adds support for
!$omp unroll
to Flang.!$omp unroll
is lowered toomp.new_cli
,omp.canonical_loop
, andomp.unroll_heuristic
in MLIR, which are lowered to LLVM-IR using the OpenMPIRBuilder (https://reviews.llvm.org/D107764).This is a continuation of #71712 and a cherry-pick of a more comprehensive patch (but taking too long) that I was working on. Currently unroll only works standalone and cannot yet combined with other directive. The following features still have to be implemented:
apply
clauseomp.loop_nest
modeling (data-sharing attributes, reductions, target/host-eval, ..)do
,distribute
,simd
,taskloop
,loop
)unroll
construct (OpenMP 5.1):full
andpartial
tile
construct (OpenMP 5.1)interchange
construct (OpenMP 6.0)stripe
construct (OpenMP 6.0)reverse
construct (OpenMP 6.0)fuse
construct (OpenMP 6.0)split
construct (OpenMP 6.0)PR #143715 adds support for the tile construct if it is the only transformation applied to a loop nest before a non-transformation loop-associated construct. It does to by adding a
tiles
clause to theomp.loop_nest
operation, which has to be handled the LoopWrapperInterface-implemention operation.Related:
PR stack:
!$omp unroll
andomp.unroll_heuristic
#144785 (this PR)