Merge 20.11.2024 #209

ergawy · 2024-11-20T11:27:46Z

No description provided.

…ed (llvm#116480) As mentioned in the title, the missing `consumeError` triggers assertions.

@mshockwave

In MCA, the load/store unit is modeled through a `LSUnitBase` class. Judging from the name `LSUnitBase`, I believe there is an intent to allow for different specialized load/store unit implementations. (However, currently there is only one implementation used in-tree, `LSUnit`.) PR llvm#101534 fixed one instance where the specialized `LSUnit` was hard-coded, opening the door for other subclasses to be used, but what subclasses can do is, in my opinion, still overly limited due to a reliance on the `MemoryGroup` class, e.g. [here](https://github.com/llvm/llvm-project/blob/8b55162e195783dd27e1c69fb4d97971ef76725b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp#L88). The `MemoryGroup` class is currently used in the default `LSUnit` implementation to model data dependencies/hazards in the pipeline. `MemoryGroups` form a graph of memory dependencies that inform the scheduler when load/store instructions can be executed relative to each other. In my eyes, this is an implementation detail. Other `LSUnit`s may want to keep track of data dependencies in different ways. As a concrete example, a downstream use I am working on<sup>[1]</sup> uses a custom load/store unit that makes use of available aliasing information. I haven't been able to shoehorn our additional aliasing information into the existing `MemoryGroup` abstraction. I think there is no need to force subclasses to use `MemoryGroup`s; users of `LSUnitBase` are only concerned with when, and for how long, a load/store instruction executes. This PR makes changes to instead leave it up to the subclasses how to model such dependencies, and only prescribes an abstract interface in `LSUnitBase`. It also moves data members and methods that are not necessary to provide an abstract interface from `LSUnitBase` to the `LSUnit` subclass. I decided to make the `MemoryGroup` a protected subclass of `LSUnit`; that way, specializations may inherit from `LSUnit` and still make use of `MemoryGroup`s if they wish to do so (e.g. if they want to only overwrite the `dispatch` method). **Drawbacks / Considerations** My reason for suggesting this PR is an out-of-tree use. As such, these changes don't introduce any new functionality for in-tree LLVM uses. However, in my opinion, these changes improve code clarity and prescribe a clear interface, which would be the main benefit for the LLVM community. A drawback of the more abstract interface is that virtual dispatching is used in more places. However, note that virtual dispatch is already currently used in some critical parts of the `LSUnitBase`, e.g. the `isAvailable` and `dispatch` methods. As a quick check to ensure these changes don't significantly negatively impact performance, I also ran `time llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s` before and after the changes; there was no observable difference in runtimes (`0.292 s` total before, `0.286 s` total after changes). <sup>[1]: MCAD started by @mshockwave and @chinmaydd.</sup>

This patch fixes: lldb/source/Host/posix/MainLoopPosix.cpp:64:11: error: unused variable 'bytes_written' [-Werror,-Wunused-variable]

This patch removes clang/Parse/ParseDiagnostic.h because it just forwards to clang/Basic/DiagnosticParse.h.

Identified with misc-include-cleaner.

Add patterns to lower `fmaxnum(fma(a, b, c), 0)` to `fma.rn{.ftz}.relu` for `f16`, `f16x2`, `bf16`, `bf16x2` types, when `nnan` is used. `fma_relu` honours `NaN`, so the substitution is only made if the `fma` is `nnan`, since `fmaxnum` returns the non NaN argument when passed a NaN value. This patch also removes some `bf16` ftz instructions since `FTZ` is not supported with the `bf16` type, according to the PTX ISA docs.

llvm#115544) The input generating functions for benchmark tests in the GenerateInput.h file can be slightly improved by invoking vector::reserve before calling vector::push_back. This slight performance improvement could potentially speed-up all benchmark tests for containers and algorithms that use these functions as inputs.

…conditions (llvm#116627) This patch bails out non-dedicated exits to avoid adding exiting conditions to invalid context. Closes llvm#116553.

…m#116621) It also modifies the error message to specify it is the dependence-type that is not supported. Resolves the crash in llvm#115647. A fix can come in later as part of future OpenMP version support.

This reverts commit 4f48a81. The newly added test was failing on the public macOS Arm64 bots: ``` ====================================================================== FAIL: test_column_breakpoints (TestDAP_breakpointLocations.TestDAP_setBreakpoints) Test retrieving the available breakpoint locations. ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py", line 77, in test_column_breakpoints self.assertEqual( AssertionError: Lists differ: [{'co[70 chars]e': 41}, {'column': 3, 'line': 42}, {'column': 18, 'line': 42}] != [{'co[70 chars]e': 42}, {'column': 18, 'line': 42}] First differing element 2: {'column': 3, 'line': 41} {'column': 3, 'line': 42} First list contains 1 additional elements. First extra element 4: {'column': 18, 'line': 42} [{'column': 39, 'line': 40}, {'column': 51, 'line': 40}, - {'column': 3, 'line': 41}, {'column': 3, 'line': 42}, {'column': 18, 'line': 42}] Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang ---------------------------------------------------------------------- Ran 1 test in 1.554s FAILED (failures=1) ```

Makes `emitc.func` implement the `OpAsmOpInterface` and overwrite the `getDefaultDialect`. This allows ops inside `emitc.func`'s body to omit the 'emitc.' prefix in the assembly.

There are a lot of messes in the special case predicate handling. Currently broad let blocks override specific predicates with more general cases. For instructions with SDWA, the HasSDWA predicate was overriding the SubtargetPredicate for the instruction. This fixes enough to properly disallow new instructions that support SDWA on older targets.

This patch adds InstrProfWriter::addMemProfData, which adds the complete MemProf profile (frames, call stacks, and records) to the writer context. Without this function, functions like loadInput in llvm-profdata.cpp and InstrProfWriter::mergeRecordsFromWriter must add one item (frame, call stack, or record) at a time. The new function std::moves the entire MemProf profile to the writer context if the destination is empty, which is the common use case. Otherwise, we fall back to adding one item at a time behind the scene. Here are a couple of reasons why we should add this function: - We've had a bug where we forgot to add one of the three data structures (frames, call stacks, and records) to the writer context, resulting in a nearly empty indexed profile. We should always package the three data structures together, especially on API boundaries. - We expose a little too much of the MemProf detail to InstrProfWriter. I'd like to gradually transform InstrProfReader/Writer to entities managing buffers (sequences of bytes), with actual serialization/deserialization left to external classes. We already do some of this in InstrProfReader, where InstrProfReader "contracts out" to IndexedMemProfReader to handle MemProf details. I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter for now because MemProfReader uses DenseMap for frames and call stacks, whereas MemProfData uses MapVector. I'll resolve these mismatches in subsequent patches.

Otherwise, LLD_IN_TEST=2 testing arm-plt-reloc.s crashes. Follow-up to https://reviews.llvm.org/D150870

The custom -- parsing from https://reviews.llvm.org/D102665 can be replaced with the generic feature from https://reviews.llvm.org/D152286 Pull Request: llvm#116565

…#115934) llvm#109711 disables `buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve codesize. Tail merging can change the layout and normally requires `buildCFGChains()` to be called again, but we want to prevent this when optimizing for codesize. We saw slight size improvement on large binaries with this change. If `-apply-ext-tsp-for-size` is not used, this should be a NFC.

…vm#115968) This will help us catch mistakes in change tracking. It's only enabled when EXPENSIVE_CHECKS are enabled.

This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair*` to `GPRF64Pair*`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair*` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.*`(rv32) and `amocas.q.*`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2*xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.

…vert" (llvm#116666) Reverts llvm#115968. It caused buildbot failures.

…2 to GPRRegClass for RV64. (llvm#116165) This is an alternative fix for llvm#81192. This allows the SelectionDAG scheduler to be able to find a representative register class for i32 on RV64. The representative register class is the super register class with the largest spill size that is also legal. The default implementation of findRepresentativeClass only works for legal types which i32 is not for RV64. I did some investigation of why tablegen uses i32 in output patterns on RV64. It appears it comes down to a function called ForceArbitraryInstResultType that picks a type for the output pattern when the isel pattern isn't specific enough. I believe it picks the smallest type(lowested numbered) to resolve the conflict. A similar issue occurs for f16 and bf16 which both use the FPR16 register class. If the isel pattern doesn't specify, tablegen may find both f16 and bf16 and may pick bf16 from Zfh pattern when Zfbfmin isn't present. Since bf16 isn't legal in that case, findRepresentativeClass will fail. For i8, i16, i32, this patch calls the base class with XLenVT to get the representative class since XLenVT is always legal. For bf16/f16, we call the base class with f32 since all of the f16/bf16 extensions depend on either F or Zfinx which will make f32 a legal type. The final representative register class further depends on whether D or Zdinx is also enabled, but that should be handled by the default implementation.

…llvm#116434) SBBreakpointName has a typedef for BreakpointHitCallback used in SetCallback(), but this typedef has been commented out in SBBreakpointName and added instead to SBDefines. Since SB API callbacks are placed in SBDefines, this commit removes this commented out portion.

…m#112596) Allow LLDB to parse the dynamic symbol table from an ELF file or memory image in an ELF file that has no section headers. This patch uses the ability to parse the PT_DYNAMIC segment and find the DT_SYMTAB, DT_SYMENT, DT_HASH or DT_GNU_HASH to find and parse the dynamic symbol table if the section headers are not present. It also adds a helper function to read data from a .dynamic key/value pair entry correctly from the file or from memory.

…sis`. (llvm#114615) The plugin analysis for `InlineAdvisor` and `InlineOrder` currently relies on shared global state to keep track if the analysis is available. This causes issues when pipelines using plugins and pipelines not using plugins are run in the same process. The shared global state can be easily replaced by checking in the given instance of `ModuleAnalysisManager` if the plugin analysis has been registered.

…vm#81299) Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend and truncate of each form. Patch commandeered from: https://reviews.llvm.org/D109377

Use update_llc_test_checks.py to automate the test checks in some files I was observing changes in locally.

Mostly a stub, but adds some baseline tests and tests for removed instructions.

…vm#116308) gfx12 and gfx950 managed to produce 3 different permutations of this feature. gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.

…16309)

…uilds. NFC. (llvm#116744) Forcely disable the libc++ benchmarks on Windows build hosts. The benchmark configuration currently does not support the cross builds on Windows hosts. Also removed unnecessary `CMAKE_CROSSCOMPILING` CMake option.

llvm#104748) This patch adds parallelization support for the following expression in OpenMP workshare constructs: * Elemental procedures in array expressions

…loop_nest (llvm#104748)" This reverts commit 40c8938. Linking errors in buildbot build

The situation that required symbol versions on the LLVM shared library can also happen for clang-cpp, although it is less common: different tools require different versions of the library, and through transitive dependencies a process ends up with multiple copies of clang-cpp. This causes havoc with ELF, because calls meant to go one version of the library end up with another. I've also considered introducing a symbol version globally, but for example the clang (C) library and other targets outside of LLVM/Clang, e.g. libc++, would not want that. So it's probably best if we keep it to those libraries.

…lvm#114398) Depends on llvm#114508 The LoongArch Reference Manual says that the 3-register atomic memory operations cannot have their rd equal to either rj or rk [^1], and both GNU as and LLVM IAS enforce the constraint for non-zero rd. However, currently LoongArch AsmParser is checking for the opcode with a direct numerical comparison on the opcode, which is enum-typed: the fact that all AMO insns have adjacent numerical values is merely a coincidence, and it is better to not rely on the current TableGen implementation behavior. Instead, start to leverage the target-specific flags field of MCInstrDesc, and record the constraint with TableGen, so we can stop treating the opcode value as number. In doing so, we also have to mark whether the instruction is AMCAS, because the operand index of rj and rk for the AMCAS instructions is different. While documenting the new flag, it was found that v1.10 of the Manual did not specify the similar constraint for the AMCAS instructions. Experiments were done on a Loongson 3A6000 (LA664 uarch) and it turned out that at least AMCAS will still signal INE with `rd == rj`. The `rd == rk` case should be a no-op according to the semantics, but as it is meaningless to perform CAS with the "old value" same as the "new value", it is not worth special-casing. So the current behavior of also enforcing the constraint for AMCAS is kept. [^1]: if `rd == rj` an INE would be signaled; if `rd == rk` it is UB.

It seems we can get there with MSVC if LLVM_BUILD_LLVM_DYLIB_VIS is set. Slightly surprising because I didn't know that MSVC supports the flag -Bsymbolic-functions, but let's play it safe.

This patch adds computeUndriftMap, a function to compute mappings from source locations in the MemProf profile to source locations in the IR.

This patch adds MemProfReader::takeMemProfData, a function to return the complete MemProf profile from the reader. We can directly pass its return value to InstrProfWriter::addMemProfData without having to deal with the indivual components of the MemProf profile. The new function is named "take", but it doesn't do std::move yet because of type differences (DenseMap v.s. MapVector). The end state I'm trying to get to is roughly as follows: - MemProfReader accepts IndexedMemProfData as a parameter as opposed to the three individual components (frames, call stacks, and records). - MemProfReader keeps IndexedMemProfData as a class member without decomposing it into its individual components. - MemProfReader returns IndexedMemProfData like: IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }

Identified with misc-include-cleaner.

Summary: Currently, the RPC interface uses a basic opcode to communicate with the server. This currently is 16 bits. There's no reason for this to be 16 bits, because on the GPU a 32-bit write is the same as a 16-bit write performance wise. Additionally, I am now making all the `libc` based opcodes qualified with the 'c' type, mimiciing how Linux handles `ioctls` all coming from the same driver. This will make it easier to extend the interface when it's exported directly.

…lvm#116000) This piece of code made the program crash. ```Verilog function pkg::t get (int t = 2, int f = 2); ``` The way the code is supposed to be parsed is that UnwrappedLineParser should identify the function header, and then TokenAnnotator should recognize the result. But the code in UnwrappedLineParser would mistakenly not recognize it due to the `::`. Then TokenAnnotator would recognize the comma both as TT_VerilogInstancePortComma and TT_VerilogTypeComma. The code for annotating the instance port comma used `setFinalizedType`. The program would crash when it tried to set it to another type. The code in UnwrappedLineParser now recognizes the `::` token. The are other cases in which TokenAnnotator would recognize the comma as both of those types, for example if the `function` keyword is removed. The type is now set using `setType` instead so that the program does not crash. The developer no longer knows why he used `setFinalizedType` back then.

This restores the code to its original state before I experimented with making i32 a legal type.

The next change will change Partition::phdrs to a unique_ptr vector, which requires PhdrEntry to be a complete type. And make OutputSection::getLMA out-of-line, since it should not include either SyntheticSections.h or Writer.h.

kuhar · 2024-11-20T14:43:12Z

@ergawy I think this has just notified ~127 llvm contributors whose changes you are pulling in

ergawy · 2024-11-20T14:48:19Z

@ergawy I think this has just notified ~127 llvm contributors whose changes you are pulling in

I added you by mistake as a reviewer (GH suggestions, removed you right away), maybe that's why you were notified. Hopefully, not everyone was. Sorry if so, not intentional.

kuhar · 2024-11-20T14:50:43Z

I made the guess based on this:

ergawy · 2024-11-20T14:52:45Z

I made the guess based on this: ...

Reasonable guess. Hopefully, not all of them were notified and only the 2 people added as reviewers by mistake.

romainthomas and others added 30 commits November 18, 2024 15:23

[PDB] Fix missing consumeError which raise error with asserts enabl…

a52cb0a

…ed (llvm#116480) As mentioned in the title, the missing `consumeError` triggers assertions.

[lldb] Fix a warning

6e1acdc

This patch fixes: lldb/source/Host/posix/MainLoopPosix.cpp:64:11: error: unused variable 'bytes_written' [-Werror,-Wunused-variable]

[Parse] Remove ParseDiagnostic.h (llvm#116496)

834dfd2

This patch removes clang/Parse/ParseDiagnostic.h because it just forwards to clang/Basic/DiagnosticParse.h.

[Target] Remove unused includes (NFC) (llvm#116577)

ed8019d

Identified with misc-include-cleaner.

[ConstraintElim] Bail out on non-dedicated exits when adding exiting …

52361d0

…conditions (llvm#116627) This patch bails out non-dedicated exits to avoid adding exiting conditions to invalid context. Closes llvm#116553.

[Flang][OpenMP] Error gracefully for dependence-type with depobj (llv…

dcd6207

…m#116621) It also modifies the error message to specify it is the dependence-type that is not supported. Resolves the crash in llvm#115647. A fix can come in later as part of future OpenMP version support.

emitc: func: Set default dialect to 'emitc' (llvm#116297)

68a3908

Makes `emitc.func` implement the `OpAsmOpInterface` and overwrite the `getDefaultDialect`. This allows ops inside `emitc.func`'s body to omit the 'emitc.' prefix in the assembly.

[ELF,ARM] Move global sectionMap into the ARM class

4092c0d

Otherwise, LLD_IN_TEST=2 testing arm-plt-reloc.s crashes. Follow-up to https://reviews.llvm.org/D150870

[llvm-objcopy] Replace custom -- parsing with DashDashParsing

2444b6f

The custom -- parsing from https://reviews.llvm.org/D102665 can be replaced with the generic feature from https://reviews.llvm.org/D152286 Pull Request: llvm#116565

[Mips] Use APInt::isMask/isShiftedMask to simplify code. (llvm#116582)

1c4caec

[flang][cuda] Materialize box when src or dst are rebox (llvm#116494)

de2e270

[SandboxIR] Add debug checker to compare IR before/after a revert (ll…

9161e6a

…vm#115968) This will help us catch mistakes in change tracking. It's only enabled when EXPENSIVE_CHECKS are enabled.

Revert "[SandboxIR] Add debug checker to compare IR before/after a re…

0ae58c4

…vert" (llvm#116666) Reverts llvm#115968. It caused buildbot failures.

[SelectionDAG] Support integer promotion for VP_LOAD and VP_STORE (ll…

ed8ebad

…vm#81299) Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend and truncate of each form. Patch commandeered from: https://reviews.llvm.org/D109377

[NVPTX][NFC] Regenerate some tests checks (llvm#116605)

18be88e

Use update_llc_test_checks.py to automate the test checks in some files I was observing changes in locally.

AMDGPU: Add gfx950 subtarget definitions (llvm#116307)

a6fc489

Mostly a stub, but adds some baseline tests and tests for removed instructions.

AMDGPU: Add subtarget features for minimum3/maximum3 instructions (ll…

cab7328

…vm#116308) gfx12 and gfx950 managed to produce 3 different permutations of this feature. gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.

AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (llvm#1…

5a556d5

…16309)

vvereschaka and others added 19 commits November 19, 2024 17:42

[flang] Introduce hlfir.elemental lowerings to omp.workshare_loop_nest (

40c8938

llvm#104748) This patch adds parallelization support for the following expression in OpenMP workshare constructs: * Elemental procedures in array expressions

Revert "[flang] Introduce hlfir.elemental lowerings to omp.workshare_…

fa22100

…loop_nest (llvm#104748)" This reverts commit 40c8938. Linking errors in buildbot build

Disable symbol versions for clang-cpp also with MSVC

b246d5f

It seems we can get there with MSVC if LLVM_BUILD_LLVM_DYLIB_VIS is set. Slightly surprising because I didn't know that MSVC supports the flag -Bsymbolic-functions, but let's play it safe.

[llvm-profdata] fix typo (llvm#116754)

314e9b1

[memprof] Add computeUndriftMap (llvm#116478)

a2e266b

This patch adds computeUndriftMap, a function to compute mappings from source locations in the MemProf profile to source locations in the IR.

[ExecutionEngine] Remove unused includes (NFC) (llvm#116749)

1f4d91e

Identified with misc-include-cleaner.

[Object] Remove unused includes (NFC) (llvm#116750)

e9c8106

Identified with misc-include-cleaner.

[ProfileData] Remove unused includes (NFC) (llvm#116751)

4f1b20f

Identified with misc-include-cleaner.

[RISCV] Add IsRV32 some patterns in RISCVInstrInfoXTHead.td.

2bf6751

This restores the code to its original state before I experimented with making i32 a legal type.

[NFC] [Coroutines] Use std::move to avoid copying (llvm#116776)

c6bce68

[ELF] Move PhdrEntry to SyntheticSections

fa4d186

The next change will change Partition::phdrs to a unique_ptr vector, which requires PhdrEntry to be a complete type. And make OutputSection::getLMA out-of-line, since it should not include either SyntheticSections.h or Writer.h.

Merge remote-tracking branch 'upstream/main' into amd-trunk-dev

33834ce

Post-merge fixes

c4bca6c

ergawy requested review from antiagainst and kuhar as code owners November 20, 2024 11:27

ergawy requested review from dpalermo and agozillon and removed request for antiagainst and kuhar November 20, 2024 11:28

ergawy merged commit 109ac78 into ROCm:amd-trunk-dev Nov 21, 2024
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge 20.11.2024 #209

Merge 20.11.2024 #209

Uh oh!

ergawy commented Nov 20, 2024

Uh oh!

kuhar commented Nov 20, 2024

Uh oh!

ergawy commented Nov 20, 2024 •

edited

Loading

Uh oh!

kuhar commented Nov 20, 2024

Uh oh!

ergawy commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

Merge 20.11.2024 #209

Merge 20.11.2024 #209

Uh oh!

Conversation

ergawy commented Nov 20, 2024

Uh oh!

kuhar commented Nov 20, 2024

Uh oh!

ergawy commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kuhar commented Nov 20, 2024

Uh oh!

ergawy commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

ergawy commented Nov 20, 2024 •

edited

Loading