|
1 |
| -# The LLVM Compiler Infrastructure |
| 1 | +# Capstone's LLVM with refactored TableGen backends |
2 | 2 |
|
3 |
| -[](https://securityscorecards.dev/viewer/?uri=github.com/llvm/llvm-project) |
4 |
| -[](https://www.bestpractices.dev/projects/8273) |
5 |
| -[](https://github.com/llvm/llvm-project/actions/workflows/libcxx-build-and-test.yaml?query=event%3Aschedule) |
| 3 | +This LLVM version has the purpose to generate code for the |
| 4 | +[Capstone disassembler](https://github.com/capstone-engine/capstone). |
6 | 5 |
|
7 |
| -Welcome to the LLVM project! |
| 6 | +It refactors the TableGen emitter backends, so they can emit C code |
| 7 | +in addition to the C++ code they normally emit. |
8 | 8 |
|
9 |
| -This repository contains the source code for LLVM, a toolkit for the |
10 |
| -construction of highly optimized compilers, optimizers, and run-time |
11 |
| -environments. |
| 9 | +Please note that within LLVM we speak of a `Target` if we refer to an architecture. |
12 | 10 |
|
13 |
| -The LLVM project has multiple components. The core of the project is |
14 |
| -itself called "LLVM". This contains all of the tools, libraries, and header |
15 |
| -files needed to process intermediate representations and convert them into |
16 |
| -object files. Tools include an assembler, disassembler, bitcode analyzer, and |
17 |
| -bitcode optimizer. |
| 11 | +## Code generation |
18 | 12 |
|
19 |
| -C-like languages use the [Clang](http://clang.llvm.org/) frontend. This |
20 |
| -component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode |
21 |
| --- and from there into object files, using LLVM. |
| 13 | +### Relevant files |
22 | 14 |
|
23 |
| -Other components include: |
24 |
| -the [libc++ C++ standard library](https://libcxx.llvm.org), |
25 |
| -the [LLD linker](https://lld.llvm.org), and more. |
| 15 | +The TableGen emitter backends are located in `llvm/utils/TableGen/`. |
26 | 16 |
|
27 |
| -## Getting the Source Code and Building LLVM |
| 17 | +The target definition files (`.td`), which define the |
| 18 | +instructions, operands, features etc., can be |
| 19 | +found in `llvm/lib/Target/<ARCH>/`. |
28 | 20 |
|
29 |
| -Consult the |
30 |
| -[Getting Started with LLVM](https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm) |
31 |
| -page for information on building and running LLVM. |
| 21 | +### Code generation overview |
32 | 22 |
|
33 |
| -For information on how to contribute to the LLVM project, please take a look at |
34 |
| -the [Contributing to LLVM](https://llvm.org/docs/Contributing.html) guide. |
| 23 | +Generating code for a target has 6 steps: |
35 | 24 |
|
36 |
| -## Getting in touch |
| 25 | +``` |
| 26 | + 5 6 |
| 27 | + ┌──────────┐ ┌──────────┐ |
| 28 | + │Printer │ │CS .inc │ |
| 29 | + 1 2 3 4 ┌──►│Capstone ├─────►│files │ |
| 30 | +┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘ |
| 31 | +│ .td │ │ │ │ │ │ Code- │ │ |
| 32 | +│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤ |
| 33 | +└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │ |
| 34 | + │ ▲ │ ┌──────────┐ ┌──────────┐ |
| 35 | + └─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │ |
| 36 | + │LLVM │ │files │ |
| 37 | + └──────────┘ └──────────┘ |
| 38 | +``` |
37 | 39 |
|
38 |
| -Join the [LLVM Discourse forums](https://discourse.llvm.org/), [Discord |
39 |
| -chat](https://discord.gg/xS7Z362), |
40 |
| -[LLVM Office Hours](https://llvm.org/docs/GettingInvolved.html#office-hours) or |
41 |
| -[Regular sync-ups](https://llvm.org/docs/GettingInvolved.html#online-sync-ups). |
| 40 | +1. LLVM targets are defined in `.td` files. They describe instructions, operands, |
| 41 | +features and other properties. |
42 | 42 |
|
43 |
| -The LLVM project has adopted a [code of conduct](https://llvm.org/docs/CodeOfConduct.html) for |
44 |
| -participants to all modes of communication within the project. |
| 43 | +2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files |
| 44 | +and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html) |
| 45 | + and other types. |
| 46 | + |
| 47 | +3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html) |
| 48 | +abstracts this even further. |
| 49 | +The result is a representation which is _not_ specific to any target |
| 50 | +(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target). |
| 51 | + |
| 52 | +4. Different code emitter backends use the result of the former two components to |
| 53 | +generated code. |
| 54 | + |
| 55 | +5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++. |
| 56 | +Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`. |
| 57 | + |
| 58 | +6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files. |
| 59 | + |
| 60 | +### Emitter backends and their use cases |
| 61 | + |
| 62 | +We use the following emitter backends |
| 63 | + |
| 64 | +| Name | Generated Code | Note | |
| 65 | +|------|----------------|------| |
| 66 | +| AsmMatcherEmitter | Mapping tables for Capstone | | |
| 67 | +| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | | |
| 68 | +| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | | |
| 69 | +| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | | |
| 70 | +| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | | |
| 71 | +| SubtargetEmitter | Table about the target features. | | |
| 72 | +| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | **1.** Not all targets use this. | |
| 73 | +| | | **2.** Backend can't access the target name. Wherever the target name is needed `__ARCH__` or `##ARCH##` is printed and later replaced. | |
| 74 | + |
| 75 | +## Developer notes |
| 76 | + |
| 77 | +- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`. |
| 78 | +If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting. |
| 79 | +So you need to figure out where this specific code snipped is printed and add `translateToC()`. |
| 80 | + |
| 81 | +- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often). |
| 82 | +You need to search for the instruction or operands with missing or incorrect values and fix them. |
| 83 | + ``` |
| 84 | + Wrong access attributes for: |
| 85 | + - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly. |
| 86 | + - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction. |
| 87 | +
|
| 88 | + Operand type is invalid: |
| 89 | + - The "OperandType" variable is unset for this operand type. |
| 90 | + ``` |
| 91 | + |
| 92 | +- If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own, |
| 93 | +checkout [DeprecatedFeatures.md](DeprecatedFeatures.md). |
0 commit comments