Skip to content

Commit 7d72151

Browse files
committed
Rebase refactored TableGen backends onto LLVM 18.
The MCInstDesc table changed. Bsides this only minor changes were done and some additional code is emitted now for LLVM. This commit is the combination of all previous Auto-Sync commits. The list of commit messages follows: ----------- Combination of all commits of the refactored tablegen backends. These are the changes made for LLVM 16. Refactor Capstone relevant TableGen Emitter backends. This commit extracts the code which emits generated tables into two printer classes. The Printer is called whenever actual code is written to a file. There is the PrinterLLVM which emits tht code as before and PrinterCapstone which is tailored to or needs (emitting C and generates more info). Additionally missing memory access properties were added to ARMs td files. Emit a single header for all files. Captialize Target name for enums. Add lay metric to emit enum value for Banked and system regs. Malloc substr Sort instructions in ascending order. Free substr after use Add vanished constrainsts Fix `regInfoEmitEnums()` and indent Fix `GenDisassemblerTables.inc#checkDecoderPredicate()` Fix `TriCoreGenRegisterInfo.inc` | `PrinterCapstone::regInfoEmitRegClasses` revert changes to NEON instructions Add instructions with duplicate operands as Matchables. Add memory load and store info Correct memory access and out operand info Set register lists again as read ops due to llvm/llvm-project#62455 Make printAliasInstr and getMnemonic static. Generate CS instruction enums from actual mnemonic. Not via the flawed AsmMatcher. Fix typo in InstrInfoEmitter.cpp Add deprecated QPX feature Replace + and - with p and m Add AssemblerPredicates to PPC Generate RegEncodingTable Define functions which are called by the Mapper as static. Necessary because these functions are present in each arch' Remove set_mem_access(). The cases where this is used to mark access to actual memory operands are either very rare, or those are neon lane indicies. Generate correct op type for absolute addresses. Check for RegisterPointer operands first to prevent mis-categorization. Add missing Operand types Generate Instruction formats for PPC. Add Paired Single instructions. Partly revert 94e41ce (introduces accidentially removed code.) Set correct operand types for PS operands Add memory read/write attributes Add missing operand types Add mayLoad and mayStore information. Add documentation. Handle special AArch64 operand Replace C++ with C code. Check for duplicate enum instr. names Check for duplicate defintions of system registers. Add note about missing target names. Resolve templates in a single static method and add docs about it. Revert printing target name in upper case. Revert partially C++ syntax fixes in .td files. They break the TemplateCOllector since it searches for exactly those references but can't find any' Add all SubtargetFeatures to feature enum. Not just the one used by CGIs. Pass Decoder Enable to check specific table fields to determine if reg enum must be emitted. Allow to add namespace to type name/ Formatting Rework emitting of tables. The system operands are now emitted in reg, imm and aliass groups. Also a bug was fixed which emitted incorrect code.. Check for rename IMPLICIT_IMM operand types Pass DecodeComplete as pointer not as reference Print undef when it needs to be printed. Add namespace ids to all types and functions. Rework C translation. Pass MCOp as pointer not as ref Add missing SysImm type Fix syntax mistakes Generate additonal sys immediates and op groups. Handle edge case for printSVERegOp Handle default arguments of template functions. Add two missing op groups Generate a static RecEncodingTable Set enum values to encodings of the sys ops Generate a single Enum value file for system operands. Replace System operand groups with their operand types Fix missing braces warning Emit MCOperand validator. Emit lookupByName functions for sys operands Add namespaces for ARM. Check for Target if default arguments of template functions are resolved. auto-sync opcode & operand encoding info generation (#14) * Added operand and opcode info generation * Wrapped deprecated macro under an intellisense check Basically intellisense fails, causing multiple errors in other files, so when intellisense parses the code it will use the different version of the macro * Fixed a small bug Used double braces to prevent an old bug Removed extra new line and fixed a bug regarding move semantics
1 parent 6c90f8d commit 7d72151

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+17929
-6921
lines changed

DeprecatedFeatures.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Deprecated Features
2+
3+
Capstone needs to support features which were removed by LLVM in the past.
4+
Here we explain how to reintroduce them.
5+
6+
## Reintroduction
7+
8+
To get the old features back we copy them from the old `.td` files and include them in the new ones.
9+
10+
To include removed features from previous LLVM versions do the following:
11+
12+
1. Checkout the last LLVM version the feature was present.
13+
2. Copy all feature related definitions into a `<ARCH>Deprecated.td` file.
14+
3. Checkout the newest LLVM version again.
15+
4. Wrap the different definition types in include guards. For example the `InstrInfo` definitions could be included in:
16+
17+
```
18+
#ifndef INCLUDED_CAPSTONE_DEPR_INSTR
19+
#ifdef CAPSTONE_DEPR_INSTR
20+
#define INCLUDED_CAPSTONE_DEPR_INSTR // Ensures it is only included once
21+
22+
[Instruction definitions of removed feature]
23+
24+
#endif // INCLUDED_CAPSTONE_DEPR_INSTR
25+
#endif // CAPSTONE_DEPR_INSTR
26+
```
27+
28+
_Note that the order of `#ifndef` and `#ifdef` matters (otherwise you'll get an error from `tblgen`)._
29+
30+
5. Include the definitions in the current definition files with:
31+
32+
```
33+
#define CAPSTONE_DEPR_INSTR
34+
include "<ARCH>Deprecated.md"
35+
```
36+
37+
## Notes
38+
- It is possible that you have to change some definitions slightly.
39+
Because certain classes no longer exist or were replaced (e.g.: `GCCBuiltin` -> `ClangBuiltin`).
40+
- Some new processors might need to have the feature flag (`Has<DeprecatedFeature>`) added
41+
to their `UnsupportedFeatures` list.

README.md

Lines changed: 81 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,93 @@
1-
# The LLVM Compiler Infrastructure
1+
# Capstone's LLVM with refactored TableGen backends
22

3-
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/llvm/llvm-project/badge)](https://securityscorecards.dev/viewer/?uri=github.com/llvm/llvm-project)
4-
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8273/badge)](https://www.bestpractices.dev/projects/8273)
5-
[![libc++](https://github.com/llvm/llvm-project/actions/workflows/libcxx-build-and-test.yaml/badge.svg?branch=main&event=schedule)](https://github.com/llvm/llvm-project/actions/workflows/libcxx-build-and-test.yaml?query=event%3Aschedule)
3+
This LLVM version has the purpose to generate code for the
4+
[Capstone disassembler](https://github.com/capstone-engine/capstone).
65

7-
Welcome to the LLVM project!
6+
It refactors the TableGen emitter backends, so they can emit C code
7+
in addition to the C++ code they normally emit.
88

9-
This repository contains the source code for LLVM, a toolkit for the
10-
construction of highly optimized compilers, optimizers, and run-time
11-
environments.
9+
Please note that within LLVM we speak of a `Target` if we refer to an architecture.
1210

13-
The LLVM project has multiple components. The core of the project is
14-
itself called "LLVM". This contains all of the tools, libraries, and header
15-
files needed to process intermediate representations and convert them into
16-
object files. Tools include an assembler, disassembler, bitcode analyzer, and
17-
bitcode optimizer.
11+
## Code generation
1812

19-
C-like languages use the [Clang](http://clang.llvm.org/) frontend. This
20-
component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode
21-
-- and from there into object files, using LLVM.
13+
### Relevant files
2214

23-
Other components include:
24-
the [libc++ C++ standard library](https://libcxx.llvm.org),
25-
the [LLD linker](https://lld.llvm.org), and more.
15+
The TableGen emitter backends are located in `llvm/utils/TableGen/`.
2616

27-
## Getting the Source Code and Building LLVM
17+
The target definition files (`.td`), which define the
18+
instructions, operands, features etc., can be
19+
found in `llvm/lib/Target/<ARCH>/`.
2820

29-
Consult the
30-
[Getting Started with LLVM](https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm)
31-
page for information on building and running LLVM.
21+
### Code generation overview
3222

33-
For information on how to contribute to the LLVM project, please take a look at
34-
the [Contributing to LLVM](https://llvm.org/docs/Contributing.html) guide.
23+
Generating code for a target has 6 steps:
3524

36-
## Getting in touch
25+
```
26+
5 6
27+
┌──────────┐ ┌──────────┐
28+
│Printer │ │CS .inc │
29+
1 2 3 4 ┌──►│Capstone ├─────►│files │
30+
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘
31+
│ .td │ │ │ │ │ │ Code- │ │
32+
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤
33+
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │
34+
│ ▲ │ ┌──────────┐ ┌──────────┐
35+
└─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │
36+
│LLVM │ │files │
37+
└──────────┘ └──────────┘
38+
```
3739

38-
Join the [LLVM Discourse forums](https://discourse.llvm.org/), [Discord
39-
chat](https://discord.gg/xS7Z362),
40-
[LLVM Office Hours](https://llvm.org/docs/GettingInvolved.html#office-hours) or
41-
[Regular sync-ups](https://llvm.org/docs/GettingInvolved.html#online-sync-ups).
40+
1. LLVM targets are defined in `.td` files. They describe instructions, operands,
41+
features and other properties.
4242

43-
The LLVM project has adopted a [code of conduct](https://llvm.org/docs/CodeOfConduct.html) for
44-
participants to all modes of communication within the project.
43+
2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files
44+
and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html)
45+
and other types.
46+
47+
3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html)
48+
abstracts this even further.
49+
The result is a representation which is _not_ specific to any target
50+
(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target).
51+
52+
4. Different code emitter backends use the result of the former two components to
53+
generated code.
54+
55+
5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++.
56+
Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`.
57+
58+
6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files.
59+
60+
### Emitter backends and their use cases
61+
62+
We use the following emitter backends
63+
64+
| Name | Generated Code | Note |
65+
|------|----------------|------|
66+
| AsmMatcherEmitter | Mapping tables for Capstone | |
67+
| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | |
68+
| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | |
69+
| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | |
70+
| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | |
71+
| SubtargetEmitter | Table about the target features. | |
72+
| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | **1.** Not all targets use this. |
73+
| | | **2.** Backend can't access the target name. Wherever the target name is needed `__ARCH__` or `##ARCH##` is printed and later replaced. |
74+
75+
## Developer notes
76+
77+
- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`.
78+
If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting.
79+
So you need to figure out where this specific code snipped is printed and add `translateToC()`.
80+
81+
- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often).
82+
You need to search for the instruction or operands with missing or incorrect values and fix them.
83+
```
84+
Wrong access attributes for:
85+
- Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
86+
- Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.
87+
88+
Operand type is invalid:
89+
- The "OperandType" variable is unset for this operand type.
90+
```
91+
92+
- If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own,
93+
checkout [DeprecatedFeatures.md](DeprecatedFeatures.md).

llvm/include/llvm/Support/Compiler.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@
151151
#define LLVM_ATTRIBUTE_USED
152152
#endif
153153

154-
#if defined(__clang__)
154+
#if defined(__clang__) && !defined(__INTELLISENSE__)
155155
#define LLVM_DEPRECATED(MSG, FIX) __attribute__((deprecated(MSG, FIX)))
156156
#else
157157
#define LLVM_DEPRECATED(MSG, FIX) [[deprecated(MSG)]]

llvm/include/llvm/TableGen/StringMatcher.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#ifndef LLVM_TABLEGEN_STRINGMATCHER_H
1414
#define LLVM_TABLEGEN_STRINGMATCHER_H
1515

16+
#include "PrinterTypes.h"
1617
#include "llvm/ADT/StringRef.h"
1718
#include <string>
1819
#include <utility>
@@ -35,18 +36,26 @@ class StringMatcher {
3536
StringRef StrVariableName;
3637
const std::vector<StringPair> &Matches;
3738
raw_ostream &OS;
39+
PrinterLanguage PL;
3840

3941
public:
4042
StringMatcher(StringRef strVariableName,
4143
const std::vector<StringPair> &matches, raw_ostream &os)
42-
: StrVariableName(strVariableName), Matches(matches), OS(os) {}
44+
: StrVariableName(strVariableName), Matches(matches), OS(os), PL(PRINTER_LANG_CPP) {}
45+
StringMatcher(StringRef strVariableName,
46+
const std::vector<StringPair> &matches, raw_ostream &os, PrinterLanguage PL)
47+
: StrVariableName(strVariableName), Matches(matches), OS(os), PL(PL) {}
4348

4449
void Emit(unsigned Indent = 0, bool IgnoreDuplicates = false) const;
50+
void EmitCPP(unsigned Indent = 0, bool IgnoreDuplicates = false) const;
4551

4652
private:
4753
bool EmitStringMatcherForChar(const std::vector<const StringPair *> &Matches,
4854
unsigned CharNo, unsigned IndentCount,
4955
bool IgnoreDuplicates) const;
56+
bool EmitStringMatcherForCharCPP(const std::vector<const StringPair *> &Matches,
57+
unsigned CharNo, unsigned IndentCount,
58+
bool IgnoreDuplicates) const;
5059
};
5160

5261
} // end namespace llvm

llvm/include/llvm/TableGen/StringToOffsetTable.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,12 @@
99
#ifndef LLVM_TABLEGEN_STRINGTOOFFSETTABLE_H
1010
#define LLVM_TABLEGEN_STRINGTOOFFSETTABLE_H
1111

12+
#include "PrinterTypes.h"
1213
#include "llvm/ADT/SmallString.h"
1314
#include "llvm/ADT/StringExtras.h"
1415
#include "llvm/ADT/StringMap.h"
1516
#include "llvm/Support/raw_ostream.h"
17+
#include "llvm/TableGen/Error.h"
1618
#include <cctype>
1719

1820
namespace llvm {
@@ -22,10 +24,14 @@ namespace llvm {
2224
/// It can then output this string blob and use indexes into the string to
2325
/// reference each piece.
2426
class StringToOffsetTable {
27+
PrinterLanguage PL;
2528
StringMap<unsigned> StringOffset;
2629
std::string AggregateString;
2730

2831
public:
32+
StringToOffsetTable() : PL(PRINTER_LANG_CPP) {};
33+
StringToOffsetTable(PrinterLanguage PL) : PL(PL) {};
34+
2935
bool Empty() const { return StringOffset.empty(); }
3036

3137
unsigned GetOrAddStringOffset(StringRef Str, bool appendZero = true) {
@@ -42,6 +48,16 @@ class StringToOffsetTable {
4248
}
4349

4450
void EmitString(raw_ostream &O) {
51+
switch(PL) {
52+
default:
53+
PrintFatalNote("No StringToOffsetTable method defined to emit the selected language.\n");
54+
case PRINTER_LANG_CPP:
55+
EmitStringCPP(O);
56+
break;
57+
}
58+
}
59+
60+
void EmitStringCPP(raw_ostream &O) {
4561
// Escape the string.
4662
SmallString<256> Str;
4763
raw_svector_ostream(Str).write_escaped(AggregateString);

llvm/lib/Support/BLAKE3/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ if (LLVM_DISABLE_ASSEMBLY_FILES)
1010
else()
1111
set(CAN_USE_ASSEMBLER TRUE)
1212
endif()
13+
set(CAN_USE_ASSEMBLER FALSE)
1314

1415
macro(disable_blake3_x86_simd)
1516
add_compile_definitions(BLAKE3_NO_AVX512 BLAKE3_NO_AVX2 BLAKE3_NO_SSE41 BLAKE3_NO_SSE2)

llvm/lib/TableGen/CMakeLists.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ add_llvm_component_library(LLVMTableGen
66
Parser.cpp
77
Record.cpp
88
SetTheory.cpp
9-
StringMatcher.cpp
109
TableGenBackend.cpp
1110
TableGenBackendSkeleton.cpp
1211
TGLexer.cpp

llvm/lib/Target/ARM/ARMInstrFormats.td

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -669,6 +669,7 @@ class AIldr_ex_or_acq<bits<2> opcod, bits<2> opcod2, dag oops, dag iops, InstrIt
669669
let Inst{11-10} = 0b11;
670670
let Inst{9-8} = opcod2;
671671
let Inst{7-0} = 0b10011111;
672+
let mayLoad = 1;
672673
}
673674
class AIstr_ex_or_rel<bits<2> opcod, bits<2> opcod2, dag oops, dag iops, InstrItinClass itin,
674675
string opc, string asm, list<dag> pattern>
@@ -684,6 +685,7 @@ class AIstr_ex_or_rel<bits<2> opcod, bits<2> opcod2, dag oops, dag iops, InstrIt
684685
let Inst{9-8} = opcod2;
685686
let Inst{7-4} = 0b1001;
686687
let Inst{3-0} = Rt;
688+
let mayStore = 1;
687689
}
688690
// Atomic load/store instructions
689691
class AIldrex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
@@ -695,6 +697,7 @@ class AIstrex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
695697
: AIstr_ex_or_rel<opcod, 0b11, oops, iops, itin, opc, asm, pattern> {
696698
bits<4> Rd;
697699
let Inst{15-12} = Rd;
700+
let mayLoad = 1;
698701
}
699702

700703
// Exclusive load/store instructions
@@ -792,6 +795,8 @@ class AI2ldstidx<bit isLd, bit isByte, bit isPre, dag oops, dag iops,
792795
let Inst{21} = isPre; // W bit
793796
let Inst{20} = isLd; // L bit
794797
let Inst{15-12} = Rt;
798+
let mayLoad = isLd;
799+
let mayStore = !eq(isLd, 0);
795800
}
796801
class AI2stridx_reg<bit isByte, bit isPre, dag oops, dag iops,
797802
IndexMode im, Format f, InstrItinClass itin, string opc,
@@ -809,6 +814,7 @@ class AI2stridx_reg<bit isByte, bit isPre, dag oops, dag iops,
809814
let Inst{11-5} = offset{11-5};
810815
let Inst{4} = 0;
811816
let Inst{3-0} = offset{3-0};
817+
let mayStore = 1;
812818
}
813819

814820
class AI2stridx_imm<bit isByte, bit isPre, dag oops, dag iops,
@@ -825,6 +831,7 @@ class AI2stridx_imm<bit isByte, bit isPre, dag oops, dag iops,
825831
let Inst{23} = offset{12};
826832
let Inst{19-16} = Rn;
827833
let Inst{11-0} = offset{11-0};
834+
let mayStore = 1;
828835
}
829836

830837

@@ -845,6 +852,7 @@ class AI2stridxT<bit isByte, bit isPre, dag oops, dag iops,
845852
let Inst{23} = addr{12};
846853
let Inst{19-16} = addr{17-14};
847854
let Inst{11-0} = addr{11-0};
855+
let mayStore = 1;
848856
}
849857

850858
// addrmode3 instructions
@@ -865,6 +873,8 @@ class AI3ld<bits<4> op, bit op20, dag oops, dag iops, Format f,
865873
let Inst{11-8} = addr{7-4}; // imm7_4/zero
866874
let Inst{7-4} = op;
867875
let Inst{3-0} = addr{3-0}; // imm3_0/Rm
876+
let mayLoad = op20;
877+
let mayStore = 0;
868878

869879
let DecoderMethod = "DecodeAddrMode3Instruction";
870880
}
@@ -881,6 +891,8 @@ class AI3ldstidx<bits<4> op, bit op20, bit isPre, dag oops, dag iops,
881891
let Inst{20} = op20; // L bit
882892
let Inst{15-12} = Rt; // Rt
883893
let Inst{7-4} = op;
894+
let mayLoad = op20;
895+
let mayStore = !if(op20, 0, 1);
884896
}
885897

886898
// FIXME: Merge with the above class when addrmode2 gets used for LDR, LDRB
@@ -903,6 +915,8 @@ class AI3ldstidxT<bits<4> op, bit isLoad, dag oops, dag iops,
903915
let Inst{19-16} = addr; // Rn
904916
let Inst{15-12} = Rt; // Rt
905917
let Inst{7-4} = op;
918+
let mayLoad = isLoad;
919+
let mayStore = !if(isLoad, 0, 1);
906920
}
907921

908922
// stores
@@ -924,6 +938,7 @@ class AI3str<bits<4> op, dag oops, dag iops, Format f, InstrItinClass itin,
924938
let Inst{7-4} = op;
925939
let Inst{3-0} = addr{3-0}; // imm3_0/Rm
926940
let DecoderMethod = "DecodeAddrMode3Instruction";
941+
let mayStore = 1;
927942
}
928943

929944
// addrmode4 instructions

0 commit comments

Comments
 (0)