Skip to content

Auto Sync 18 #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ccffacd
Rebase refactored TableGen backends onto LLVM 18.
Rot127 Oct 28, 2022
6235952
Add script to check syntax and compare LLVM upstream and our tables.
Rot127 Mar 3, 2024
f9fd852
Add CI jobs from main auto-sync branch again.
Rot127 Mar 3, 2024
2624e45
Add table compare script to CI.
Rot127 Mar 3, 2024
58187dd
Add missing commandline options for refactored backends.
Rot127 Mar 12, 2024
8fd53ce
Add missing closing brackets in PPC def files.
Rot127 Mar 12, 2024
04b6947
Update our def files to follow https://github.com/llvm/llvm-project/c…
Rot127 Mar 12, 2024
e761060
Add a rebuild options and print more logs
Rot127 Mar 12, 2024
cfa6207
Blind fix for Github CI build
Rot127 Mar 12, 2024
36dce65
Fix incorrect use of variable.
Rot127 Mar 12, 2024
419be5c
Don't dump output of build in /dev/null
Rot127 Mar 12, 2024
7d40471
Use Python provided cmake and Ninja for CI build and select gnu compi…
Rot127 Mar 19, 2024
d8f704f
Fix some incorrect generated LLVM code.
Rot127 Mar 19, 2024
2b4afeb
Build debug llvm-tblgen in CI
Rot127 Mar 19, 2024
c5650c7
Add build instructions
Rot127 Mar 19, 2024
627a665
Separate generating of tables into different scripts so we can use Gi…
Rot127 Mar 19, 2024
1d3d43e
Fix workflows
Rot127 Mar 19, 2024
3438dec
Fix some incorrectly generated source code after rebase.
Rot127 Mar 20, 2024
203b857
Make gen scripts use the repository root dir.
Rot127 Mar 20, 2024
8301b60
Remove syntax check, because it doesn't work that easy.
Rot127 Mar 20, 2024
e483b81
Fix mismatches in generated C++ code.
Rot127 Mar 20, 2024
e976368
Fix mismatch in C++ Subtarget files
Rot127 Mar 20, 2024
b832a2d
Fix mismatch in generated C++ SystemOperands files.
Rot127 Mar 20, 2024
efb7921
Fix no-return values warning from compiler.
Rot127 Apr 10, 2024
47557b8
Remove instruction encoding information.
Rot127 Apr 24, 2024
90224e3
Enable EmitMapTable to print C tables.
Rot127 Apr 24, 2024
cfc0a3e
Extends docs
Rot127 Apr 25, 2024
64473f4
Add missing include guard to ignore list.
Rot127 Apr 25, 2024
d679d0d
Remove asserts
Rot127 Apr 25, 2024
1e71ccf
Define the InstrTable as own type
Rot127 Apr 25, 2024
db503ca
Add default argument for printSVERegOp
Rot127 Apr 25, 2024
66a0d13
Generate InstDecs tables with references to the OpInfo structs.
Rot127 Apr 25, 2024
501f6db
Format code
Rot127 Apr 25, 2024
34b2b4a
Fix: Don't return NULL for a struct
Rot127 Apr 25, 2024
ae30930
Fix template function translation
Rot127 Apr 25, 2024
cce0edb
Add MatrixIndex_... to the OP_GROUP list
Rot127 Apr 25, 2024
f21eecd
Add AdrLabel and AdrpLable to OP_GROUPS
Rot127 Apr 25, 2024
120cef7
Remove check for same name, different signature functions.
Rot127 Apr 30, 2024
27da8ea
Assign enum value to the raw_val member to prevent compiler warnings.
Rot127 Apr 30, 2024
ee2e109
Initialize DecoderComplete flag in generated decoder function.
Rot127 May 15, 2024
62835c2
Check in patterns for memory operand properties.
Rot127 May 15, 2024
785382b
Add memory access info as supplementary AArch64 info
Rot127 May 15, 2024
35363ba
Fix regex pattern to not match operand names between ] and [
Rot127 May 16, 2024
5c36e67
Generate BOUND flags for SME operands.
Rot127 May 21, 2024
e8e7ad5
Add LoongArch support
jiegec Jan 23, 2024
2614f48
Handle multiple template arguments in handleDefaultArg
jiegec May 3, 2024
ae04f86
Change RegDiffLists type to MCPhysReg
jiegec May 3, 2024
9eb6e3e
Avoid using llvm_unreachable
jiegec May 3, 2024
dd3384a
Assign OPERAND_IMMEDIATE as OperandType of BareSymbol
Rot127 May 3, 2024
75ac2d7
Handle INVALID_SIMPLE_VALUE_TYPE in getEnumName
jiegec May 3, 2024
ad35649
Set OperandType to OPERAND_IMMEDIATE for immediate operands
jiegec May 3, 2024
fb9cfc5
Emit formats enum and supplemental info for LoongArch
jiegec May 4, 2024
3229ea2
Rename CS_AC_READ_WRTE to CS_AC_READ_WRITE
jiegec May 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions .github/workflows/LLVM-Auto-Updater.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: Weekly-LLVM-Release-Update
on:
workflow_dispatch:
inputs:
tag_name:
description: "LLVM Version"
required: false
schedule:
- cron: "0 0 * * 1"

permissions:
contents: write

jobs:
merge-llvm:
runs-on: ubuntu-latest
outputs:
branch_version: ${{ steps.step1.outputs.branch_version }}
steps:
- name: Get LLVM version
id: step1
run: |
if [[ -z $github_event_inputs_tag_name ]]; then
tag_name=$(curl -sL https://api.github.com/repos/llvm/llvm-project/releases/latest | jq -r '.tag_name')
else
tag_name=$github.event.inputs.tag_name
fi

echo "Using version: $tag_name"
echo "tag_name=${tag_name}" >> $GITHUB_ENV
version=$(echo $tag_name | grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+')
echo "version=${version}" >> $GITHUB_ENV
major_version=$(echo $version | awk -F '.' '{print $1}')
echo "major_version=${major_version}" >> $GITHUB_ENV
branch_version='release/'$major_version'.x'
echo "branch_version=${branch_version}" >> $GITHUB_ENV
is_official_release=$(curl -sL https://api.github.com/repos/llvm/llvm-project/releases/latest | jq -r '.prerelease')
echo "is_official_release=${is_official_release}" >> $GITHUB_ENV
echo "branch_version=${branch_version}" >> $GITHUB_OUTPUT

- name: Ensure official release
run: |
if [[ "${{ env.is_official_release }}" != "false" ]]; then
exit 0
fi

- name: Checkout LLVM-project
uses: actions/checkout@v3
with:
fetch-depth: 1

- name: Sparse checkout LLVM-project
run: |
git clone --depth 1 --filter=blob:none --sparse --branch ${{ env.branch_version }} https://github.com/llvm/llvm-project.git ../llvm-project
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
cd ../llvm-project
echo "sparse checkout"
git sparse-checkout set llvm/ cmake/ third-party/ .github/
rm -rf .git/ .github/
ls -la
cd -

- name: Add files to branch ${{ env.branch_version }}
run: |
branch_exists=$(git ls-remote --exit-code --heads origin ${{env.branch_version}})
echo "branch_exists=${branch_exists}" >> $GITHUB_ENV
git checkout ${{ env.branch_version }} 2>/dev/null || git checkout -b ${{ env.branch_version }}
if [[ "${{ env.branch_exists }}" = true ]]; then
git pull origin ${{ env.branch_version }}
git push -u origin ${{ env.branch_version }}
rm -rf !\(.git\|.github\)
fi
cp -r ../llvm-project/* .
ls -la
git add .

- name: Update branch ${{ env.branch_version }}
if: ${{ env.branch_exists }}
run: |
echo "Branch already exists, update."
if [[ `git status --porcelain` ]]; then
git commit -m "Update LLVM ${{ env.branch_version }}"
else
echo "No changes to commit."
exit 0
fi

- name: Commit new branch ${{ env.branch_version }}
if: ${{ ! env.branch_exists }}
run: |
git commit -m "Add LLVM ${{ env.branch_version }}"

- name: Push new branch
run: |
git push origin ${{ env.branch_version }} -f

build-llvm-tblgen:
runs-on: ubuntu-latest
needs: merge-llvm
env:
branch_version: ${{ needs.merge-llvm.outputs.branch_version }}
steps:
- uses: lukka/get-cmake@latest

- name: Checkout llvm-capstone
uses: actions/checkout@v3
with:
ref: ${{ env.branch_version }}

- name: Build llvm tblgen
run: |
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
78 changes: 78 additions & 0 deletions .github/workflows/LLVM-Tblgen-Build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: LLVM-Tblgen-Build
on:
workflow_dispatch:
push:
paths-ignore:
- ".github/workflows/LLVM-Auto-Updater.yml"
- "CONTRIBUTING.md"
- "README.md"
- "LICENSE.TXT"
- "SECURITY.md"
branches:
- auto-sync*
pull_request:
paths-ignore:
- ".github/workflows/LLVM-Auto-Updater.yml"
- "CONTRIBUTING.md"
- "README.md"
- "LICENSE.TXT"
- "SECURITY.md"
branches:
- auto-sync*
- release/**

jobs:
build-and-test-llvm-tblgen:
runs-on: ubuntu-latest
steps:
- name: Checkout llvm-capstone (patched backends)
uses: actions/checkout@v4
with:
clean: false

- name: Set up Python
uses: actions/setup-python@v4

- name: Install dependencies
run: pip install cmake Ninja

- name: Build patched llvm-tblgen
run: |
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
cd ..

- name: Generate Capstone tables
run: |
./gen_cs_tables.sh

- name: Checkout LLVM
uses: actions/checkout@v4
with:
clean: false
ref: auto-sync-18-base

- name: Build LLVM llvm-tblgen
run: |
rm -rf build
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
cd ..

- name: Checkout llvm-capstone (patched backends)
uses: actions/checkout@v4
with:
clean: false

- name: Generate original LLVM tables
run: |
./gen_llvm_tables.sh

- name: Compare LLVM and Capstone tables and syntax
run: |
./compare_tblgen_output.sh

41 changes: 41 additions & 0 deletions DeprecatedFeatures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Deprecated Features

Capstone needs to support features which were removed by LLVM in the past.
Here we explain how to reintroduce them.

## Reintroduction

To get the old features back we copy them from the old `.td` files and include them in the new ones.

To include removed features from previous LLVM versions do the following:

1. Checkout the last LLVM version the feature was present.
2. Copy all feature related definitions into a `<ARCH>Deprecated.td` file.
3. Checkout the newest LLVM version again.
4. Wrap the different definition types in include guards. For example the `InstrInfo` definitions could be included in:

```
#ifndef INCLUDED_CAPSTONE_DEPR_INSTR
#ifdef CAPSTONE_DEPR_INSTR
#define INCLUDED_CAPSTONE_DEPR_INSTR // Ensures it is only included once

[Instruction definitions of removed feature]

#endif // INCLUDED_CAPSTONE_DEPR_INSTR
#endif // CAPSTONE_DEPR_INSTR
```

_Note that the order of `#ifndef` and `#ifdef` matters (otherwise you'll get an error from `tblgen`)._

5. Include the definitions in the current definition files with:

```
#define CAPSTONE_DEPR_INSTR
include "<ARCH>Deprecated.md"
```

## Notes
- It is possible that you have to change some definitions slightly.
Because certain classes no longer exist or were replaced (e.g.: `GCCBuiltin` -> `ClangBuiltin`).
- Some new processors might need to have the feature flag (`Has<DeprecatedFeature>`) added
to their `UnsupportedFeatures` list.
146 changes: 114 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,126 @@
# The LLVM Compiler Infrastructure
# Capstone's LLVM with refactored TableGen backends

[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/llvm/llvm-project/badge)](https://securityscorecards.dev/viewer/?uri=github.com/llvm/llvm-project)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8273/badge)](https://www.bestpractices.dev/projects/8273)
[![libc++](https://github.com/llvm/llvm-project/actions/workflows/libcxx-build-and-test.yaml/badge.svg?branch=main&event=schedule)](https://github.com/llvm/llvm-project/actions/workflows/libcxx-build-and-test.yaml?query=event%3Aschedule)
This LLVM version has the purpose to generate code for the
[Capstone disassembler](https://github.com/capstone-engine/capstone).

Welcome to the LLVM project!
It refactors the TableGen emitter backends, so they can emit C code
in addition to the C++ code they normally emit.

This repository contains the source code for LLVM, a toolkit for the
construction of highly optimized compilers, optimizers, and run-time
environments.
## Build

The LLVM project has multiple components. The core of the project is
itself called "LLVM". This contains all of the tools, libraries, and header
files needed to process intermediate representations and convert them into
object files. Tools include an assembler, disassembler, bitcode analyzer, and
bitcode optimizer.
```
python3 -m venv .venv
source .venv/bin/activate
pip install Ninja cmake
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
```

C-like languages use the [Clang](http://clang.llvm.org/) frontend. This
component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode
-- and from there into object files, using LLVM.
## Code generation

Other components include:
the [libc++ C++ standard library](https://libcxx.llvm.org),
the [LLD linker](https://lld.llvm.org), and more.
Please note that within LLVM we speak of a `Target` if we refer to an architecture.

## Getting the Source Code and Building LLVM
### Relevant files

Consult the
[Getting Started with LLVM](https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm)
page for information on building and running LLVM.
The TableGen emitter backends are located in `llvm/utils/TableGen/`.

For information on how to contribute to the LLVM project, please take a look at
the [Contributing to LLVM](https://llvm.org/docs/Contributing.html) guide.
The target definition files (`.td`), which define the
instructions, operands, features etc., can be
found in `llvm/lib/Target/<ARCH>/`.

## Getting in touch
### Code generation overview

Join the [LLVM Discourse forums](https://discourse.llvm.org/), [Discord
chat](https://discord.gg/xS7Z362),
[LLVM Office Hours](https://llvm.org/docs/GettingInvolved.html#office-hours) or
[Regular sync-ups](https://llvm.org/docs/GettingInvolved.html#online-sync-ups).
Generating code for a target has 6 steps:

The LLVM project has adopted a [code of conduct](https://llvm.org/docs/CodeOfConduct.html) for
participants to all modes of communication within the project.
```
5 6
┌──────────┐ ┌──────────┐
│Printer │ │CS .inc │
1 2 3 4 ┌──►│Capstone ├─────►│files │
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘
│ .td │ │ │ │ │ │ Code- │ │
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │
│ ▲ │ ┌──────────┐ ┌──────────┐
└─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │
│LLVM │ │files │
└──────────┘ └──────────┘
```

1. LLVM targets are defined in `.td` files. They describe instructions, operands,
features and other properties.

2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files
and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html)
and other types.

3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html)
abstracts this even further.
The result is a representation which is _not_ specific to any target
(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target).

4. Different code emitter backends use the result of the former two components to
generated code.

5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++.
Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`.

6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files.

### Emitter backends and their use cases

We use the following emitter backends

| Name | Generated Code | Note |
|------|----------------|------|
| AsmMatcherEmitter | Mapping tables for Capstone | |
| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | |
| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | |
| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | |
| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | |
| SubtargetEmitter | Table about the target features. | |
| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | **1.** Not all targets use this. |
| | | **2.** Backend can't access the target name. Wherever the target name is needed `__ARCH__` or `##ARCH##` is printed and later replaced. |

## Developer notes

- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`.
If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting.
So you need to figure out where this specific code snipped is printed and add `translateToC()`.

- Template functions with default values for their arguments, don't get replaced properly.
See: `handleDefaultArg()` in `PrinterCapstone.cpp` to add the default argument value.

- Some operand printer or decoder are not recognized. Compiler error like:
```
.../AArch64GenAsmWriter.inc:18216:5: warning: implicit declaration of function ‘printMatrixIndex_1’; did you mean ‘printMatrix_0’? [-Wimplicit-function-declaration]
18216 | printMatrixIndex_1(MI, 2, O);
| ^~~~~~~~~~~~~~~~~~
| printMatrix_0

```
To fix this the function declaration is probably missing in the header (e.g. `<ARCH>InstPrinter.h`). You can copy the `DEFINE_printMatrix()` function to the header
and rewrite it as declaration. Just check the other `DECLARE_...` macros in the header file.

- And `ARCH_OP_GROUP_...` is missing or not generated. Build error like:
```
AArch64InstPrinter.c:2249:42: error: ‘AArch64_OP_GROUP_MatrixIndex_8’ undeclared (first use in this function); did you mean ‘AArch64_OP_GROUP_MatrixIndex’?
2249 | add_cs_detail(MI, CONCAT(AArch64_OP_GROUP_MatrixIndex, Scale), \
```
Fix it by adding the postfix `MatrixIndex_8` to one of the exception lists in `PrinterCapstone::printOpPrintGroupEnum()`.

- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often).
You need to search for the instruction or operands with missing or incorrect values and fix them.
```
Wrong access attributes for:
- Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
- Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.

Operand type is invalid:
- The "OperandType" variable is unset for this operand type.
```

- If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own,
checkout [DeprecatedFeatures.md](DeprecatedFeatures.md).
Loading
Loading