Skip to content

release/20.x: [CoroSplit] Always erase lifetime intrinsics for spilled allocas (#142551) #147248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10,000 commits into from

Conversation

NewSigma
Copy link
Contributor

@NewSigma NewSigma commented Jul 7, 2025

Backport #142551 for #124612

huntergr-arm and others added 30 commits May 14, 2025 15:01
…lvm#138298)

783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding
VF explicitly) instead of 2, but didn't change the corresponding
pattern matcher.

This matcher was only used in vputils::isHeaderMask, and no test
ever reached that function with a ScalarIVSteps recipe for the
value being matched -- it was always a WideCanonicalIV. So the
matcher bailed out immediately before checking arguments and
asserting that the number of arguments in the recipe was the
same provided by the matcher.

Since the constructors for ScalarIVSteps take 3 values, we should
be safe to update the matcher and guard it with a dedicated gtest.

m_CanonicalIV() on the other hand is removed; as a phi recipe it
may not have a consistent number of arguments to match, only
requiring one (the start value) when being constructed with the
assumption that a second incoming value is added for the backedge
later. In order to keep the matcher we would need to add multiple
matchers with different numbers of arguments for it depending on
what phase of vplan construction we were in, and ensure that we
never reorder matcher usage vs. vplan transformation. Since the
main IR PatternMatch.h doesn't contain any matchers for PHI nodes,
I think we can just remove it and match via m_Specific() using the
VPValue we get from Plan.getCanonicalIV().
When implementing the parsing for OpenACC I ensured that we could always
continue to do diagnostics/etc.  For the most part, I was consistent
with that assumption throughout clause Sema, but in 3 cases I left in
some unreachables for cases where this would happen. I've now properly
handled all 3 in a reasonable way.

Fixes: llvm#139894
)

Flang currently adds loop metadata to a conditional branch in the loop
preheader, while clang adds it to the loop latch's branch instruction.
Langref says:

> Currently, loop metadata is implemented as metadata attached to the
branch instruction in the loop latch block.
>
> https://llvm.org/docs/LangRef.html#llvm-loop

I misread langref a couple times, but I think this is the appropriate
branch op for the LoopAnnotationAttr. In a couple examples I found that
the metadata was lost entirely during canonicalization. This patch makes
the codegen look more like clang's and the annotations persist through
codegen.

* current clang: https://godbolt.org/z/8WhbcrnG3
* current flang: https://godbolt.org/z/TrPboqqcn
The GDB pretty printer test works just fine with Clang in the CI now,
except that it breaks (not exactly unexpectedly) with optimizations
enabled.
The passed indices have to be constant integers anyway, which we verify
before creating the ShuffleVectorExpr. Use the value we create there and
save the indices using a ConstantExpr instead. This way, we don't have
to evaluate the args every time we call getShuffleMaskIdx().
There were two crashes that have the same root cause: not correctly
handling unexpected tokens. In one case, we were failing to return early
which caused us to parse a paren as a regular token instead of a special
token, causing an assertion. The other case was failing to commit or
revert the tentative parse action when not getting a paren when one was
expected.

Fixes llvm#139665
…m#139784)

Static analysis flagged the use of Left.size() because we just moved out
of Left and that would be undefined behavior. Fix is to take the size
and store it in a local variable instead.
…ed (llvm#139820)

Fixes llvm#139779.

The bug was introduced in llvm#137355 in `SymbolConjured::getStmt`, when
trying to obtain a statement for a CFG initializer without an
initializer. This commit adds a null check before access.
This patch updates `createWriteOrMaskedWrite` to make it consistent with
`createReadOrMaskedRead`.

Before diving into the details: note that these utilities are currently
implemented in different files — "VectorUtils.cpp" (Vector) and
"Vectorization.cpp" (Linalg). In a subsequent patch, I plan to move
`createWriteOrMaskedWrite` into "VectorUtils.cpp".

SUMMARY OF CHANGES:

The main change is to remove the logic that creates the destination
tensor, which previously looked like:
```cpp
  Value dest = builder.create<tensor::EmptyOp>(loc, destSizes,
                                               inputType.getElementType());
```

With this patch, createWriteOrMaskedWrite now simply generates:
```mlir
  %res = vector.transfer_write %vectorToStore into %dest
```

This replaces the previous form:
```mlir
  %dest = tensor.empty(%destSizes)
  %res = vector.transfer_write %vectorToStore into %dest
```

In other words, the destination value `%dest` is now passed as an input
parameter. This makes `createWriteOrMaskedWrite` re-usable in contexts
where the destination tensor is already known — for example, in
`vectorizeAsInsertSliceOp`, which I will update in a follow-up patch.

OTHER CHANGES:

* Added comments and clarified TODOs.

* Updated tests: since destination sizes are now computed independently
  inside `createWriteOrMaskedWrite`, some additional `tensor.dim` ops
  appear. These will be cleaned up by CSE + canonicalization.
The `CacheStream::commit()` function (defined in Caching.cpp) deletes
the underlying raw stream. Some output streamers may hold a pointer
to it, which then will outlive the stream object.
In particular, MCAsmStreamer keeps the pointer to the raw stream
though a separate `formatted_raw_stream` object, which buffers data and
there is no path to explicitly flush this data. Before this change,
the buffered data was flushed during the MCAsmStreamer destructor.
After llvm#136121, this happened after the `commit()` function is called.
Therefore, it caused a crash because the `formatted_raw_stream` object
tries to write the buffered data into a deleted raw stream. Even if
we don't delete the stream to avoid the crash, it would be too late
as the output stream cannot accept data after commit().

Fixes: llvm#138194.
…vm#139874)

In `checkOrAndOpImpliedByOther`, replacing an operand of a disjoint or
is unsafe: https://alive2.llvm.org/ce/z/4R4hxN
This patch performs the simplification directly, to avoid miscompilation
and unnecessary canonicalization.

Closes llvm#137937.
## Purpose
Add proper preprocessor guards for all `dump()` methods in the LLVM
support library. This change ensures these methods are not part of the
public ABI for release builds.

## Overview
* Annotates all `dump` methods in Support and ADT source with the
`LLVM_DUMP_METHOD` macro.
* Conditionally includes all `dump` method definitions in Support and
ADT source so they are only present on debug/assert builds and when
`LLVM_ENABLE_DUMP` is explicitly defined.

NOTE: For many of these `dump` methods, the implementation was already
properly guarded but the declaration in the header file was not.

## Background
This issue was raised in comments on llvm#136629. I am addressing it as a
separate change since it is independent from the changes being made in
that PR.

According to [this
documentation](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/Compiler.h#L637),
`dump` methods should be annotated with `LLVM_DUMP_METHOD` and
conditionally included as follows:
```
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void dump() const;
#endif
```

## Validation
* Local release build succeeds.
* CI
This patch fixes:

  mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1544:14: error:
  unused variable 'destType' [-Werror,-Wunused-variable]
Add dedicated unit tests for the protocol enum types.
…)"

This reverts commit 97aa01b and
7e7871d due to failures on windows.
This commit documents the process of specifying values for the analyzer
options and checker options implemented in the static analyzer, and adds
a script which includes the documentation of the analyzer options (which
was previously only available through a command-line flag) in the
RST-based web documentation.
…lvm#125056)

This adds a token for a forward slash to the token definition list and
the methods to `AsmParser::parseSlash()` and
`AsmParser::parseOptionalSlash()`, similar to other tokens used as
operators (e.g., star, plus, etc.). This allows implementations of
attributes that contain arithmetic expressions to support operators with
a forward slash, e.g., a division.

The newly added check tests trigger the parsing of a slash in an
attribute.
…lvm#139838)

Simplify usage of `getTrailingObjects` for OpenACC/OpenMP Clause by
using either the non-templated form for single trailing types or using
the single argument form that returns an ArrayRef/MutableArrayRef.
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on llvm#139193

llvm#139193 (review)

Test Plan: updated heatmap-preagg.test
…9903)

Previously the class `ExplodedGraph` had a data member called `Roots`
which could (in theory) store multiple root nodes -- but in practice
exploded graphs always had at most one root node (zero was possible for
empty and partially copied graphs) and introducing a graph with multiple
roots would've caused severe malfuncitons (e.g. in code that used the
pattern `*roots_begin()` to refer to _the_ root node).

I don't see any practical use case for adding multiple root nodes in a
single graph (this seems to be yet another of the "generalize for the
sake of generalization" decisions which were common in the early history
of the analyzer), so this commit replaces the vector `Roots` with
`ExplodedNode *Root` (which may be null when the graph is empty or under
construction).

Note that the complicated logic of `ExplodedGraph::trim` deserves a
through cleanup, but I left that for a follow-up commit.
Note that the lambda function we are in returns bool, so
FileZero.compare(FileOne) is equivalent to FileZero != FileOne in this
context.
Don't split i64 load/store when we have Zilsd.

In the future, we should enhanced the LoadStoreOptimizer pass to do
this, but this is a good starting point. Even if we support it in
LoadStoreOptimizer, we might still want this for volatile loads/stores
to guarantee the use of Zilsd.
This adds basic support for testing the Transport class and includes
tests for 'Read' and 'Write'.
The instructions are not supported on either 32-bit ELF (due to no
redzone) or 32-bit AIX due to the instructions always using the full
64-bit width of the register inputs.
…133752)

This patch enhances the test coverage of `{std,ranges}::swap_ranges` by
adding larger test cases with 100 elements across different containers.
It also inlines standalone tests for better readability, avoiding
unnecessary navigation.

This patch addresses a follow-up suggestion from PR llvm#121138 to extend
test coverage beyond 3 elements.
Pierre-vh and others added 13 commits May 16, 2025 10:16
Introduces a slight regression in some cases but it'll even out once we
disable the promotion in CGP.
…tion (llvm#140029)

For a dependent variable template specialization, we don't build a
dependent Decl node or a DeclRefExpr to represent it. Instead, we
preserve the UnresolvedLookupExpr until instantiation.

However, this approach isn't ideal for constraint normalization. We
consider the qualifier during profiling, but since that's based on the
written code, it can introduce confusing differences, even when the
expressions resolve to the same declaration.

This change ensures that, if possible, we profile the resolved
declaration instead of its qualifier. For expressions that resolve to
more than one declarations, we still profile its qualifier, as otherwise
it would make us depend on the order of lookup results.

Fixes llvm#139476
Directive was implemented in c87bd2d to support lazy binding and is
emitted for vector PCS functions. It's specific to ELF but is currently
emitted for all binary formats and crashing on non-ELF targets.

Fixes llvm#138260

---------

Co-authored-by: Cullen Rhodes <cullen.rhodes@arm.com>
…eg combine (llvm#140207)

The hasOneUseCheck does not really add anything and makes the combine too
restrictive. Upcoming patches benefit from removing the hasOneUse check.
…0208)

For the majority of cases, this is a neutral or positive change.
There are even testcases that greatly benefit from it, but some regressions are possible.
There is llvm#140040 for GlobalISel that'd need to be fixed but it's only a one instruction regression and I think it can be fixed later.

Solves llvm#64591
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
The previous method splits vector data into two halves. shuffle_vector
concatenates the two results into a vector data of original size. This
PR eliminates the use of shuffle_vector.
…llvm#139871)

This enables file_specific_compile_options to take precedence over
ARG_COMPILE_FLAGS. For example, if we add -fno-slp-vectorize to
COMPILE_OPTIONS of a file, the behavior changes as follows:
* Before this PR: -fno-slp-vectorize is overwritten by -O3, resulting in
SLP vectorizer remaining enabled.
* After this PR: -fno-slp-vectorize overwrites -O3, effectively
disabling SLP vectorizer.
Currently coroutine promises are modeled as allocas. This is problematic
because other middle-end passes will assume promise dead after coroutine
suspend, leading to misoptimizations.

I propose the front ends remain free to emit and use allocas to model
coro promise. At CoroEarly, we will replace all uses of promise alloca
with `coro.promise`. Non coroutine passes should only access promise
through `coro.promise`. Then at CoroSplit, we will lower `coro.promise`
back to promise alloca again. So that it will be correctly collected
into coro frame. Note that we do not have to bother maintainers of other
middle-end passes.

Fix llvm#105595
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
@NewSigma
Copy link
Contributor Author

NewSigma commented Jul 7, 2025

Seems there is something wrong

@NewSigma NewSigma closed this Jul 7, 2025
@NewSigma NewSigma deleted the erase-spilled-lifetimes branch July 7, 2025 08:15
@Endilll Endilll removed their request for review July 7, 2025 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.