-
Notifications
You must be signed in to change notification settings - Fork 14.4k
release/20.x: [CoroSplit] Always erase lifetime intrinsics for spilled allocas (#142551) #147248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lvm#138298) 783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding VF explicitly) instead of 2, but didn't change the corresponding pattern matcher. This matcher was only used in vputils::isHeaderMask, and no test ever reached that function with a ScalarIVSteps recipe for the value being matched -- it was always a WideCanonicalIV. So the matcher bailed out immediately before checking arguments and asserting that the number of arguments in the recipe was the same provided by the matcher. Since the constructors for ScalarIVSteps take 3 values, we should be safe to update the matcher and guard it with a dedicated gtest. m_CanonicalIV() on the other hand is removed; as a phi recipe it may not have a consistent number of arguments to match, only requiring one (the start value) when being constructed with the assumption that a second incoming value is added for the backedge later. In order to keep the matcher we would need to add multiple matchers with different numbers of arguments for it depending on what phase of vplan construction we were in, and ensure that we never reorder matcher usage vs. vplan transformation. Since the main IR PatternMatch.h doesn't contain any matchers for PHI nodes, I think we can just remove it and match via m_Specific() using the VPValue we get from Plan.getCanonicalIV().
When implementing the parsing for OpenACC I ensured that we could always continue to do diagnostics/etc. For the most part, I was consistent with that assumption throughout clause Sema, but in 3 cases I left in some unreachables for cases where this would happen. I've now properly handled all 3 in a reasonable way. Fixes: llvm#139894
) Flang currently adds loop metadata to a conditional branch in the loop preheader, while clang adds it to the loop latch's branch instruction. Langref says: > Currently, loop metadata is implemented as metadata attached to the branch instruction in the loop latch block. > > https://llvm.org/docs/LangRef.html#llvm-loop I misread langref a couple times, but I think this is the appropriate branch op for the LoopAnnotationAttr. In a couple examples I found that the metadata was lost entirely during canonicalization. This patch makes the codegen look more like clang's and the annotations persist through codegen. * current clang: https://godbolt.org/z/8WhbcrnG3 * current flang: https://godbolt.org/z/TrPboqqcn
The GDB pretty printer test works just fine with Clang in the CI now, except that it breaks (not exactly unexpectedly) with optimizations enabled.
The passed indices have to be constant integers anyway, which we verify before creating the ShuffleVectorExpr. Use the value we create there and save the indices using a ConstantExpr instead. This way, we don't have to evaluate the args every time we call getShuffleMaskIdx().
There were two crashes that have the same root cause: not correctly handling unexpected tokens. In one case, we were failing to return early which caused us to parse a paren as a regular token instead of a special token, causing an assertion. The other case was failing to commit or revert the tentative parse action when not getting a paren when one was expected. Fixes llvm#139665
…m#139784) Static analysis flagged the use of Left.size() because we just moved out of Left and that would be undefined behavior. Fix is to take the size and store it in a local variable instead.
…ed (llvm#139820) Fixes llvm#139779. The bug was introduced in llvm#137355 in `SymbolConjured::getStmt`, when trying to obtain a statement for a CFG initializer without an initializer. This commit adds a null check before access.
This patch updates `createWriteOrMaskedWrite` to make it consistent with `createReadOrMaskedRead`. Before diving into the details: note that these utilities are currently implemented in different files — "VectorUtils.cpp" (Vector) and "Vectorization.cpp" (Linalg). In a subsequent patch, I plan to move `createWriteOrMaskedWrite` into "VectorUtils.cpp". SUMMARY OF CHANGES: The main change is to remove the logic that creates the destination tensor, which previously looked like: ```cpp Value dest = builder.create<tensor::EmptyOp>(loc, destSizes, inputType.getElementType()); ``` With this patch, createWriteOrMaskedWrite now simply generates: ```mlir %res = vector.transfer_write %vectorToStore into %dest ``` This replaces the previous form: ```mlir %dest = tensor.empty(%destSizes) %res = vector.transfer_write %vectorToStore into %dest ``` In other words, the destination value `%dest` is now passed as an input parameter. This makes `createWriteOrMaskedWrite` re-usable in contexts where the destination tensor is already known — for example, in `vectorizeAsInsertSliceOp`, which I will update in a follow-up patch. OTHER CHANGES: * Added comments and clarified TODOs. * Updated tests: since destination sizes are now computed independently inside `createWriteOrMaskedWrite`, some additional `tensor.dim` ops appear. These will be cleaned up by CSE + canonicalization.
The `CacheStream::commit()` function (defined in Caching.cpp) deletes the underlying raw stream. Some output streamers may hold a pointer to it, which then will outlive the stream object. In particular, MCAsmStreamer keeps the pointer to the raw stream though a separate `formatted_raw_stream` object, which buffers data and there is no path to explicitly flush this data. Before this change, the buffered data was flushed during the MCAsmStreamer destructor. After llvm#136121, this happened after the `commit()` function is called. Therefore, it caused a crash because the `formatted_raw_stream` object tries to write the buffered data into a deleted raw stream. Even if we don't delete the stream to avoid the crash, it would be too late as the output stream cannot accept data after commit(). Fixes: llvm#138194.
…vm#139874) In `checkOrAndOpImpliedByOther`, replacing an operand of a disjoint or is unsafe: https://alive2.llvm.org/ce/z/4R4hxN This patch performs the simplification directly, to avoid miscompilation and unnecessary canonicalization. Closes llvm#137937.
## Purpose Add proper preprocessor guards for all `dump()` methods in the LLVM support library. This change ensures these methods are not part of the public ABI for release builds. ## Overview * Annotates all `dump` methods in Support and ADT source with the `LLVM_DUMP_METHOD` macro. * Conditionally includes all `dump` method definitions in Support and ADT source so they are only present on debug/assert builds and when `LLVM_ENABLE_DUMP` is explicitly defined. NOTE: For many of these `dump` methods, the implementation was already properly guarded but the declaration in the header file was not. ## Background This issue was raised in comments on llvm#136629. I am addressing it as a separate change since it is independent from the changes being made in that PR. According to [this documentation](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/Compiler.h#L637), `dump` methods should be annotated with `LLVM_DUMP_METHOD` and conditionally included as follows: ``` #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void dump() const; #endif ``` ## Validation * Local release build succeeds. * CI
This patch fixes: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1544:14: error: unused variable 'destType' [-Werror,-Wunused-variable]
Add dedicated unit tests for the protocol enum types.
This commit documents the process of specifying values for the analyzer options and checker options implemented in the static analyzer, and adds a script which includes the documentation of the analyzer options (which was previously only available through a command-line flag) in the RST-based web documentation.
…lvm#125056) This adds a token for a forward slash to the token definition list and the methods to `AsmParser::parseSlash()` and `AsmParser::parseOptionalSlash()`, similar to other tokens used as operators (e.g., star, plus, etc.). This allows implementations of attributes that contain arithmetic expressions to support operators with a forward slash, e.g., a division. The newly added check tests trigger the parsing of a slash in an attribute.
…lvm#139838) Simplify usage of `getTrailingObjects` for OpenACC/OpenMP Clause by using either the non-templated form for single trailing types or using the single argument form that returns an ArrayRef/MutableArrayRef.
In heatmap mode, report samples and utilization of the section(s) between hot text markers `[__hot_start, __hot_end)`. The intended use is with multi-way splitting where there are several sections that contain "hot" code (e.g. `.text.warm` with CDSplit). Addresses the comment on llvm#139193 llvm#139193 (review) Test Plan: updated heatmap-preagg.test
…9903) Previously the class `ExplodedGraph` had a data member called `Roots` which could (in theory) store multiple root nodes -- but in practice exploded graphs always had at most one root node (zero was possible for empty and partially copied graphs) and introducing a graph with multiple roots would've caused severe malfuncitons (e.g. in code that used the pattern `*roots_begin()` to refer to _the_ root node). I don't see any practical use case for adding multiple root nodes in a single graph (this seems to be yet another of the "generalize for the sake of generalization" decisions which were common in the early history of the analyzer), so this commit replaces the vector `Roots` with `ExplodedNode *Root` (which may be null when the graph is empty or under construction). Note that the complicated logic of `ExplodedGraph::trim` deserves a through cleanup, but I left that for a follow-up commit.
Note that the lambda function we are in returns bool, so FileZero.compare(FileOne) is equivalent to FileZero != FileOne in this context.
Don't split i64 load/store when we have Zilsd. In the future, we should enhanced the LoadStoreOptimizer pass to do this, but this is a good starting point. Even if we support it in LoadStoreOptimizer, we might still want this for volatile loads/stores to guarantee the use of Zilsd.
This adds basic support for testing the Transport class and includes tests for 'Read' and 'Write'.
The instructions are not supported on either 32-bit ELF (due to no redzone) or 32-bit AIX due to the instructions always using the full 64-bit width of the register inputs.
…133752) This patch enhances the test coverage of `{std,ranges}::swap_ranges` by adding larger test cases with 100 elements across different containers. It also inlines standalone tests for better readability, avoiding unnecessary navigation. This patch addresses a follow-up suggestion from PR llvm#121138 to extend test coverage beyond 3 elements.
Introduces a slight regression in some cases but it'll even out once we disable the promotion in CGP.
…tion (llvm#140029) For a dependent variable template specialization, we don't build a dependent Decl node or a DeclRefExpr to represent it. Instead, we preserve the UnresolvedLookupExpr until instantiation. However, this approach isn't ideal for constraint normalization. We consider the qualifier during profiling, but since that's based on the written code, it can introduce confusing differences, even when the expressions resolve to the same declaration. This change ensures that, if possible, we profile the resolved declaration instead of its qualifier. For expressions that resolve to more than one declarations, we still profile its qualifier, as otherwise it would make us depend on the order of lookup results. Fixes llvm#139476
Directive was implemented in c87bd2d to support lazy binding and is emitted for vector PCS functions. It's specific to ELF but is currently emitted for all binary formats and crashing on non-ELF targets. Fixes llvm#138260 --------- Co-authored-by: Cullen Rhodes <cullen.rhodes@arm.com>
…eg combine (llvm#140207) The hasOneUseCheck does not really add anything and makes the combine too restrictive. Upcoming patches benefit from removing the hasOneUse check.
…0208) For the majority of cases, this is a neutral or positive change. There are even testcases that greatly benefit from it, but some regressions are possible. There is llvm#140040 for GlobalISel that'd need to be fixed but it's only a one instruction regression and I think it can be fixed later. Solves llvm#64591
The "target-features" function attribute is not currently considered when adding vscale_range to a function. When +sve/+sme are pushed onto functions with "#pragma attribute push(+sve/+sme)", the function potentially misses out on optimizations that rely on vscale_range being present.
The previous method splits vector data into two halves. shuffle_vector concatenates the two results into a vector data of original size. This PR eliminates the use of shuffle_vector.
…llvm#139871) This enables file_specific_compile_options to take precedence over ARG_COMPILE_FLAGS. For example, if we add -fno-slp-vectorize to COMPILE_OPTIONS of a file, the behavior changes as follows: * Before this PR: -fno-slp-vectorize is overwritten by -O3, resulting in SLP vectorizer remaining enabled. * After this PR: -fno-slp-vectorize overwrites -O3, effectively disabling SLP vectorizer.
Currently coroutine promises are modeled as allocas. This is problematic because other middle-end passes will assume promise dead after coroutine suspend, leading to misoptimizations. I propose the front ends remain free to emit and use allocas to model coro promise. At CoroEarly, we will replace all uses of promise alloca with `coro.promise`. Non coroutine passes should only access promise through `coro.promise`. Then at CoroSplit, we will lower `coro.promise` back to promise alloca again. So that it will be correctly collected into coro frame. Note that we do not have to bother maintainers of other middle-end passes. Fix llvm#105595
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
Seems there is something wrong |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport #142551 for #124612