Skip to content

[mlir][scf] fix getSuccessorRegions func in scf.forall #147491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cxy-1993
Copy link
Contributor

@cxy-1993 cxy-1993 commented Jul 8, 2025

In accordance with the semantics of forall, its body is executed in parallel by multiple threads. We should not expect to branch back into the forall body after the region's execution is complete.

In accordance with the semantics of forall, its body is executed in
parallel by multiple threads. We should not expect to branch back into
the forall body after the region's execution is complete.
@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2025

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: donald chen (cxy-1993)

Changes

In accordance with the semantics of forall, its body is executed in parallel by multiple threads. We should not expect to branch back into the forall body after the region's execution is complete.


Full diff: https://github.com/llvm/llvm-project/pull/147491.diff

1 Files Affected:

  • (modified) mlir/lib/Dialect/SCF/IR/SCF.cpp (+7-5)
diff --git a/mlir/lib/Dialect/SCF/IR/SCF.cpp b/mlir/lib/Dialect/SCF/IR/SCF.cpp
index 5a3bd984530db..2e0a7b85af756 100644
--- a/mlir/lib/Dialect/SCF/IR/SCF.cpp
+++ b/mlir/lib/Dialect/SCF/IR/SCF.cpp
@@ -1927,11 +1927,13 @@ void ForallOp::getCanonicalizationPatterns(RewritePatternSet &results,
 /// not a constant.
 void ForallOp::getSuccessorRegions(RegionBranchPoint point,
                                    SmallVectorImpl<RegionSuccessor> &regions) {
-  // Both the operation itself and the region may be branching into the body or
-  // back into the operation itself. It is possible for loop not to enter the
-  // body.
-  regions.push_back(RegionSuccessor(&getRegion()));
-  regions.push_back(RegionSuccessor());
+  // In accordance with the semantics of forall, its body is executed in
+  // parallel by multiple threads. We should not expect to branch back into
+  // the forall body after the region's execution is complete.
+  if (point.isParent())
+    regions.push_back(RegionSuccessor(&getRegion()));
+  else
+    regions.push_back(RegionSuccessor());
 }
 
 //===----------------------------------------------------------------------===//

Copy link
Member

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stemming from some real use case or bug?

The control flow interfaces are not modeling parallelism currently, and this is a bit tricky. In particular, control flow interfaces are used by dataflow analyses to understand whether a block may be executed before or after another block, including a different "instance" of the same block. This may be the case here. I'd like to better think through this before we decide one way or another.

@cxy-1993
Copy link
Contributor Author

cxy-1993 commented Jul 8, 2025

Is this stemming from some real use case or bug?

The control flow interfaces are not modeling parallelism currently, and this is a bit tricky. In particular, control flow interfaces are used by dataflow analyses to understand whether a block may be executed before or after another block, including a different "instance" of the same block. This may be the case here. I'd like to better think through this before we decide one way or another.

We're using dataflow analysis on a very large program, and the compilation time is excessively long because scf.forall assumes the body may execute after body.

The current implementation doesn't cause functional issues, but it's conservative, increasing compilation time and potentially making the analysis overly cautious. The forall semantic is meant to indicate parallel execution across multiple threads, so for a single thread, it's more reasonable for the forall body to exit after execution (similar to an outlined gpu program).

@joker-eph
Copy link
Collaborator

There can't be any "data" flowing from one instance of a parallel block to another.
I don't think they are ordered and having the dynamic iteration N+1 being the successor or the dynamic iteration N does not really makes sense to me?

I don't quite get why this was modeled this way in the first place though, I may be missing something here.

@matthias-springer
Copy link
Member

For reference, I found these old discussions / PRs:

The last PR may have accidentally changed the behavior.

@cxy-1993
Copy link
Contributor Author

Ping, looks like we've reached a consensus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants