Skip to content

Merge 20.11.2024 #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 620 commits into from
Nov 21, 2024
Merged

Merge 20.11.2024 #209

merged 620 commits into from
Nov 21, 2024

Conversation

ergawy
Copy link

@ergawy ergawy commented Nov 20, 2024

No description provided.

romainthomas and others added 30 commits November 18, 2024 15:23
…ed (llvm#116480)

As mentioned in the title, the missing `consumeError` triggers assertions.
In MCA, the load/store unit is modeled through a `LSUnitBase` class.
Judging from the name `LSUnitBase`, I believe there is an intent to
allow for different specialized load/store unit implementations.
(However, currently there is only one implementation used in-tree,
`LSUnit`.)

PR llvm#101534 fixed one instance where the specialized `LSUnit` was
hard-coded, opening the door for other subclasses to be used, but what
subclasses can do is, in my opinion, still overly limited due to a
reliance on the `MemoryGroup` class, e.g.
[here](https://github.com/llvm/llvm-project/blob/8b55162e195783dd27e1c69fb4d97971ef76725b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp#L88).

The `MemoryGroup` class is currently used in the default `LSUnit`
implementation to model data dependencies/hazards in the pipeline.
`MemoryGroups` form a graph of memory dependencies that inform the
scheduler when load/store instructions can be executed relative to each
other.

In my eyes, this is an implementation detail. Other `LSUnit`s may want
to keep track of data dependencies in different ways. As a concrete
example, a downstream use I am working on<sup>[1]</sup> uses a custom
load/store unit that makes use of available aliasing information. I
haven't been able to shoehorn our additional aliasing information into
the existing `MemoryGroup` abstraction. I think there is no need to
force subclasses to use `MemoryGroup`s; users of `LSUnitBase` are only
concerned with when, and for how long, a load/store instruction
executes.

This PR makes changes to instead leave it up to the subclasses how to
model such dependencies, and only prescribes an abstract interface in
`LSUnitBase`. It also moves data members and methods that are not
necessary to provide an abstract interface from `LSUnitBase` to the
`LSUnit` subclass. I decided to make the `MemoryGroup` a protected
subclass of `LSUnit`; that way, specializations may inherit from
`LSUnit` and still make use of `MemoryGroup`s if they wish to do so
(e.g. if they want to only overwrite the `dispatch` method).

**Drawbacks / Considerations**

My reason for suggesting this PR is an out-of-tree use. As such, these
changes don't introduce any new functionality for in-tree LLVM uses.
However, in my opinion, these changes improve code clarity and prescribe
a clear interface, which would be the main benefit for the LLVM
community.

A drawback of the more abstract interface is that virtual dispatching is
used in more places. However, note that virtual dispatch is already
currently used in some critical parts of the `LSUnitBase`, e.g. the
`isAvailable` and `dispatch` methods. As a quick check to ensure these
changes don't significantly negatively impact performance, I also ran
`time llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2
-iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s`
before and after the changes; there was no observable difference in
runtimes (`0.292 s` total before, `0.286 s` total after changes).

<sup>[1]: MCAD started by @mshockwave and @chinmaydd.</sup>
This patch fixes:

  lldb/source/Host/posix/MainLoopPosix.cpp:64:11: error: unused
  variable 'bytes_written' [-Werror,-Wunused-variable]
This patch removes clang/Parse/ParseDiagnostic.h because it just
forwards to clang/Basic/DiagnosticParse.h.
Identified with misc-include-cleaner.
Add patterns to lower `fmaxnum(fma(a, b, c), 0)` to `fma.rn{.ftz}.relu`
for `f16`, `f16x2`, `bf16`, `bf16x2` types, when `nnan` is used.

`fma_relu` honours `NaN`, so the substitution is only made if the `fma`
is `nnan`, since `fmaxnum` returns the non NaN argument when passed a
NaN value.

This patch also removes some `bf16` ftz instructions since `FTZ` is not
supported with the `bf16` type, according to the PTX ISA docs.
llvm#115544)

The input generating functions for benchmark tests in the GenerateInput.h
file can be slightly improved by invoking vector::reserve before calling
vector::push_back. This slight performance improvement could potentially
speed-up all benchmark tests for containers and algorithms that use these
functions as inputs.
…conditions (llvm#116627)

This patch bails out non-dedicated exits to avoid adding exiting
conditions to invalid context.
Closes llvm#116553.
…m#116621)

It also modifies the error message to specify it is the dependence-type
that is not supported.

Resolves the crash in
llvm#115647. A fix can come in
later as part of future OpenMP version support.
This reverts commit 4f48a81.

The newly added test was failing on the public macOS Arm64 bots:
```
======================================================================
FAIL: test_column_breakpoints (TestDAP_breakpointLocations.TestDAP_setBreakpoints)
   Test retrieving the available breakpoint locations.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py", line 77, in test_column_breakpoints
    self.assertEqual(
AssertionError: Lists differ: [{'co[70 chars]e': 41}, {'column': 3, 'line': 42}, {'column': 18, 'line': 42}] != [{'co[70 chars]e': 42}, {'column': 18, 'line': 42}]

First differing element 2:
{'column': 3, 'line': 41}
{'column': 3, 'line': 42}

First list contains 1 additional elements.
First extra element 4:
{'column': 18, 'line': 42}

  [{'column': 39, 'line': 40},
   {'column': 51, 'line': 40},
-  {'column': 3, 'line': 41},
   {'column': 3, 'line': 42},
   {'column': 18, 'line': 42}]
Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 1.554s

FAILED (failures=1)
```
Makes `emitc.func` implement the `OpAsmOpInterface` and overwrite the
`getDefaultDialect`. This allows ops inside `emitc.func`'s body to omit
the 'emitc.' prefix in the assembly.
There are a lot of messes in the special case
predicate handling. Currently broad let blocks
override specific predicates with more general
cases. For instructions with SDWA, the HasSDWA
predicate was overriding the SubtargetPredicate
for the instruction.

This fixes enough to properly disallow new instructions
that support SDWA on older targets.
This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time.  The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case.  Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

- We've had a bug where we forgot to add one of the three data
  structures (frames, call stacks, and records) to the writer context,
  resulting in a nearly empty indexed profile.  We should always
  package the three data structures together, especially on API
  boundaries.

- We expose a little too much of the MemProf detail to
  InstrProfWriter.  I'd like to gradually transform
  InstrProfReader/Writer to entities managing buffers (sequences of
  bytes), with actual serialization/deserialization left to external
  classes.  We already do some of this in InstrProfReader, where
  InstrProfReader "contracts out" to IndexedMemProfReader to handle
  MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector.  I'll resolve these
mismatches in subsequent patches.
Otherwise, LLD_IN_TEST=2 testing arm-plt-reloc.s crashes.

Follow-up to https://reviews.llvm.org/D150870
The custom -- parsing from https://reviews.llvm.org/D102665 can be
replaced with the generic feature from https://reviews.llvm.org/D152286

Pull Request: llvm#116565
…#115934)

llvm#109711 disables
`buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve
codesize. Tail merging can change the layout and normally requires
`buildCFGChains()` to be called again, but we want to prevent this when
optimizing for codesize. We saw slight size improvement on large
binaries with this change. If `-apply-ext-tsp-for-size` is not used,
this should be a NFC.
…vm#115968)

This will help us catch mistakes in change tracking. It's only enabled
when EXPENSIVE_CHECKS are enabled.
This patch adds support for getting even-odd general purpose register
pairs into and out of inline assembly using the `R` constraint as
proposed in riscv-non-isa/riscv-c-api-doc#92

There are a few different pieces to this patch, each of which need their
own explanation.

- Renames the Register Class used for f64 values on rv32i_zdinx from
  `GPRPair*` to `GPRF64Pair*`. These register classes are kept broadly
  unmodified, as their primary value type is used for type inference
  over selection patterns. This rename affects quite a lot of files.

- Adds new `GPRPair*` register classes which will be used for `R`
  constraints and for instructions that need an even-odd GPR pair. This
  new type is used for `amocas.d.*`(rv32) and `amocas.q.*`(rv64) in
  Zacas, instead of the `GPRF64Pair` class being used before.

- Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`.
  Two new RISCVISD node types are added for creating and destructing a
  pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when
  bitcasting to/from the pair type and `untyped`.

- Adds functionality to `splitValueIntoRegisterParts` and
  `joinRegisterPartsIntoValue` to handle changing `i<2*xlen>` MVTs into
  `untyped` pairs.

- Adds an override for `getNumRegisters` to ensure that `i<2*xlen>`
  values, when going to/from inline assembly, only allocate one (pair)
  register (they would otherwise allocate two). This is due to a bug in
  SelectionDAGBuilder.cpp which other backends also work around.

- Ensures that Clang understands that `R` is a valid inline assembly
  constraint.

- This also allows `R` to be used for `f64` types on `rv32_zdinx`
  architectures, where doubles are stored in a GPR pair.
…2 to GPRRegClass for RV64. (llvm#116165)

This is an alternative fix for llvm#81192. This allows the SelectionDAG
scheduler to be able to find a representative register class for i32 on
RV64. The representative register class is the super register class with
the largest spill size that is also legal. The default implementation of
findRepresentativeClass only works for legal types which i32 is not for
RV64.

I did some investigation of why tablegen uses i32 in output patterns on
RV64. It appears it comes down to a function called
ForceArbitraryInstResultType that picks a type for the output
pattern when the isel pattern isn't specific enough. I believe it picks
the smallest type(lowested numbered) to resolve the conflict.

A similar issue occurs for f16 and bf16 which both use the FPR16
register class. If the isel pattern doesn't specify, tablegen may find
both f16 and bf16 and may pick bf16 from Zfh pattern when Zfbfmin isn't
present. Since bf16 isn't legal in that case, findRepresentativeClass
will fail.

For i8, i16, i32, this patch calls the base class with XLenVT to get the
representative class since XLenVT is always legal.

For bf16/f16, we call the base class with f32 since all of the f16/bf16
extensions depend on either F or Zfinx which will make f32 a legal type.
The final representative register class further depends on whether D or
Zdinx is also enabled, but that should be handled by the default
implementation.
…llvm#116434)

SBBreakpointName has a typedef for BreakpointHitCallback used in
SetCallback(), but this typedef has been commented out in
SBBreakpointName and added instead to SBDefines. Since SB API callbacks
are placed in SBDefines, this commit removes this commented out portion.
…m#112596)

Allow LLDB to parse the dynamic symbol table from an ELF file or memory
image in an ELF file that has no section headers. This patch uses the
ability to parse the PT_DYNAMIC segment and find the DT_SYMTAB,
DT_SYMENT, DT_HASH or DT_GNU_HASH to find and parse the dynamic symbol
table if the section headers are not present. It also adds a helper
function to read data from a .dynamic key/value pair entry correctly
from the file or from memory.
…sis`. (llvm#114615)

The plugin analysis for `InlineAdvisor` and `InlineOrder` currently
relies on shared global state to keep track if the analysis is
available.
This causes issues when pipelines using plugins and pipelines not using
plugins are run in the same process.
The shared global state can be easily replaced by checking in the given
instance of `ModuleAnalysisManager` if the plugin analysis has been
registered.
…vm#81299)

Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend
and truncate of each form.

Patch commandeered from: https://reviews.llvm.org/D109377
Use update_llc_test_checks.py to automate the test checks in some files
I was observing changes in locally.
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
…vm#116308)

gfx12 and gfx950 managed to produce 3 different permutations of this feature.
gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.
vvereschaka and others added 19 commits November 19, 2024 17:42
…uilds. NFC. (llvm#116744)

Forcely disable the libc++ benchmarks on Windows build hosts. The
benchmark configuration currently does not support the cross builds on
Windows hosts.

Also removed unnecessary `CMAKE_CROSSCOMPILING` CMake option.
llvm#104748)

This patch adds parallelization support for the following expression in OpenMP
workshare constructs:

* Elemental procedures in array expressions
…loop_nest (llvm#104748)"

This reverts commit 40c8938.

Linking errors in buildbot build
The situation that required symbol versions on the LLVM shared library
can also happen for clang-cpp, although it is less common: different
tools require different versions of the library, and through transitive
dependencies a process ends up with multiple copies of clang-cpp. This
causes havoc with ELF, because calls meant to go one version of the
library end up with another.

I've also considered introducing a symbol version globally, but for
example the clang (C) library and other targets outside of LLVM/Clang,
e.g. libc++, would not want that. So it's probably best if we keep it to
those libraries.
…lvm#114398)

Depends on llvm#114508

The LoongArch Reference Manual says that the 3-register atomic memory
operations cannot have their rd equal to either rj or rk [^1], and both
GNU as and LLVM IAS enforce the constraint for non-zero rd. However,
currently LoongArch AsmParser is checking for the opcode with a direct
numerical comparison on the opcode, which is enum-typed: the fact that
all AMO insns have adjacent numerical values is merely a coincidence,
and it is better to not rely on the current TableGen implementation
behavior.

Instead, start to leverage the target-specific flags field of
MCInstrDesc, and record the constraint with TableGen, so we can stop
treating the opcode value as number. In doing so, we also have to mark
whether the instruction is AMCAS, because the operand index of rj and rk
for the AMCAS instructions is different.

While documenting the new flag, it was found that v1.10 of the Manual
did not specify the similar constraint for the AMCAS instructions.
Experiments were done on a Loongson 3A6000 (LA664 uarch) and it turned
out that at least AMCAS will still signal INE with `rd == rj`. The `rd
== rk` case should be a no-op according to the semantics, but as it is
meaningless to perform CAS with the "old value" same as the "new value",
it is not worth special-casing. So the current behavior of also
enforcing the constraint for AMCAS is kept.

[^1]: if `rd == rj` an INE would be signaled; if `rd == rk` it is UB.
It seems we can get there with MSVC if LLVM_BUILD_LLVM_DYLIB_VIS is set.
Slightly surprising because I didn't know that MSVC supports the flag
-Bsymbolic-functions, but let's play it safe.
This patch adds computeUndriftMap, a function to compute mappings from
source locations in the MemProf profile to source locations in the IR.
This patch adds MemProfReader::takeMemProfData, a function to return
the complete MemProf profile from the reader.  We can directly pass
its return value to InstrProfWriter::addMemProfData without having to
deal with the indivual components of the MemProf profile.  The new
function is named "take", but it doesn't do std::move yet because of
type differences (DenseMap v.s. MapVector).

The end state I'm trying to get to is roughly as follows:

- MemProfReader accepts IndexedMemProfData as a parameter as opposed
  to the three individual components (frames, call stacks, and
  records).

- MemProfReader keeps IndexedMemProfData as a class member without
  decomposing it into its individual components.

- MemProfReader returns IndexedMemProfData like:

  IndexedMemProfData takeMemProfData() {
    return std::move(MemProfData);
  }
Identified with misc-include-cleaner.
Identified with misc-include-cleaner.
Summary:
Currently, the RPC interface uses a basic opcode to communicate with the
server. This currently is 16 bits. There's no reason for this to be 16
bits, because on the GPU a 32-bit write is the same as a 16-bit write
performance wise.

Additionally, I am now making all the `libc` based opcodes qualified
with the 'c' type, mimiciing how Linux handles `ioctls` all coming from
the same driver. This will make it easier to extend the interface when
it's exported directly.
…lvm#116000)

This piece of code made the program crash.

```Verilog
function pkg::t get
    (int t = 2,
     int f = 2);
```

The way the code is supposed to be parsed is that UnwrappedLineParser
should identify the function header, and then TokenAnnotator should
recognize the result.  But the code in UnwrappedLineParser would
mistakenly not recognize it due to the `::`.  Then TokenAnnotator would
recognize the comma both as TT_VerilogInstancePortComma and
TT_VerilogTypeComma.  The code for annotating the instance port comma
used `setFinalizedType`.  The program would crash when it tried to set
it to another type.

The code in UnwrappedLineParser now recognizes the `::` token.

The are other cases in which TokenAnnotator would recognize the comma as
both of those types, for example if the `function` keyword is removed.
The type is now set using `setType` instead so that the program does not
crash.  The developer no longer knows why he used `setFinalizedType`
back then.
This restores the code to its original state before I experimented
with making i32 a legal type.
The next change will change Partition::phdrs to a unique_ptr vector,
which requires PhdrEntry to be a complete type.

And make OutputSection::getLMA out-of-line, since it should not include
either SyntheticSections.h or Writer.h.
@ergawy ergawy requested review from dpalermo and agozillon and removed request for antiagainst and kuhar November 20, 2024 11:28
@kuhar
Copy link
Member

kuhar commented Nov 20, 2024

@ergawy I think this has just notified ~127 llvm contributors whose changes you are pulling in

@ergawy
Copy link
Author

ergawy commented Nov 20, 2024

@ergawy I think this has just notified ~127 llvm contributors whose changes you are pulling in

I added you by mistake as a reviewer (GH suggestions, removed you right away), maybe that's why you were notified. Hopefully, not everyone was. Sorry if so, not intentional.

@kuhar
Copy link
Member

kuhar commented Nov 20, 2024

I made the guess based on this:
image

@ergawy
Copy link
Author

ergawy commented Nov 20, 2024

I made the guess based on this: ...

Reasonable guess. Hopefully, not all of them were notified and only the 2 people added as reviewers by mistake.

@ergawy ergawy merged commit 109ac78 into ROCm:amd-trunk-dev Nov 21, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.