Skip to content

Commit 749d230

Browse files
committed
RFC: CFI Improvements with PAuth and BTI
Improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification. Copyright (c) 2021, Arm Limited.
1 parent 2821d03 commit 749d230

File tree

1 file changed

+142
-0
lines changed

1 file changed

+142
-0
lines changed
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Summary
2+
[summary]: #summary
3+
4+
This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two
5+
technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target
6+
Identification.
7+
8+
# Motivation
9+
[motivation]: #motivation
10+
11+
The [security model of WebAssembly](https://webassembly.org/docs/security/) ensures that Wasm
12+
modules execute in a sandboxed environment isolated from the host runtime. One aspect of that model
13+
is that it provides implicit control flow integrity (CFI) by forcing all function call targets to
14+
specify a valid entry in the function index space, by using a protected call stack that is not
15+
affected by buffer overflows in the module heap, and so on. As a result, in some Wasm applications
16+
the runtime is able to execute untrusted code safely. However, that places the burden of ensuring
17+
that the security properties are upheld on the compiler to a large extent.
18+
19+
On the other hand, a further aspect of the WebAssembly design is efficient execution (close to
20+
native speed), which leads to a natural tendency towards sophisticated optimizing compilers.
21+
Unfortunately, the additional complexity increases the risk of implementation problems and in
22+
particular compromises of the security properties. For example, Cranelift has been affected by
23+
issues such as CVE-2021-32629 [cve] that could make it possible to access the protected call stack
24+
or memory that is private to the host runtime.
25+
26+
We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as
27+
expanding fuzzing and making it possible to apply formal verification to at least some parts of the
28+
compilation process. However, it is also reasonable to consider a defense in depth strategy and to
29+
evaluate mitigations for potential future issues.
30+
31+
Finally, Wasmtime can be used as a library and in particular embedded into an application that is
32+
implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that
33+
case the compiled WebAssembly code could provide convenient instruction sequences for attacks that
34+
subvert normal control flow and that originate from the embedder's code, even if Cranelift and
35+
Wasmtime themselves lack any defects.
36+
37+
[cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5
38+
39+
# Proposal
40+
[proposal]: #proposal
41+
42+
Currently this proposal focuses on the AArch64 execution environment.
43+
44+
## Background
45+
46+
The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e.
47+
provides back-edge CFI. It is described in section D5.1.5 of
48+
[the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP`
49+
instructions when executed by a processor that does not support the extension.
50+
51+
The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is
52+
provides forward-edge CFI and is described in section D5.4.4. A processor implementation with BTI
53+
would support PAuth as well, but not necessarily vice versa. Whether BTI applies to an executable
54+
memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad"
55+
for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for
56+
processors that do not support BTI).
57+
58+
Both extensions are applicable only to the AArch64 execution state and are optional, so each CFI
59+
technique would be employed only if the target environment provides the necessary ISA support.
60+
Wasmtime embedders need to consider a subtlety - if they cache the result of the check, that may
61+
happen to be located in memory that could be potentially accessible to an attacker, so the latter
62+
could disable the use of PAuth and BTI in subsequent code generation. Mitigating this issue is
63+
outside the scope of this proposal.
64+
65+
The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] provides an introduction
66+
to the technologies.
67+
68+
[arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en
69+
[code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story
70+
71+
## Improved back-edge CFI with PAuth
72+
73+
The proposed implementation will add the `PACIASP` instruction to the beginning of every function
74+
compiled by Cranelift and would replace the final return with the `RETAA` instruction.
75+
76+
In environments that use the DWARF format for unwinding the implementation would be modified to
77+
apply the `DW_CFA_AARCH64_negate_ra_state` operation immediately after the `PACIASP` instruction.
78+
79+
These steps can be skipped for simple leaf functions that do not construct frame records on the
80+
stack.
81+
82+
## Enhanced forward-edge CFI with BTI
83+
84+
The proposed implementation will add the `BTI j` instruction to the beginning of every basic block
85+
that is the target of an indirect branch and that is not a function prologue. Note that in the
86+
AArch64 backend generated function calls always target function prologues and indirect branches that
87+
do not act like function calls appear only in the implementation of the `br_table` IR operation.
88+
Function prologues would be covered by the pointer authentication instructions, which also act as
89+
landing pads - as discussed before, BTI support implies Pauth.
90+
91+
During development one simple way to create a working prototype is to add the landing pads to the
92+
beginning of every basic block, irrespective of whether it is the target of an indirect branch or
93+
not. In this way it can be checked if BTI causes any issue with the rest of the runtime.
94+
95+
## CFI improvements to assembly, C, C++, and Rust code
96+
97+
Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of
98+
this proposal, but in general it should be achievable by passing the appropriate parameters to the
99+
respective compiler.
100+
101+
Functions implemented in assembly will get a similar treatment as generated code, i.e. they will
102+
start with the `PACIASP` instruction. However, the regular return will be preserved and instead will
103+
be preceded by the `AUTIASP` instruction. The reason is that both `AUTIASP` and `PACIASP` act as
104+
`NOP` instructions when executed by a processor that does not support PAuth, thus making the
105+
assembly code generic.
106+
107+
# Rationale and alternatives
108+
[rationale-and-alternatives]: #rationale-and-alternatives
109+
110+
Since the existing implementation already uses the standard back-edge CFI techniques that are
111+
preferred in the absence of special hardware support (i.e. a separate protected stack that is not
112+
used for buffers that could be accessed out of bounds), the alternative is not to implement the
113+
proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code
114+
size the impact of the back-edge CFI improvements is an additional instruction per function, or 2
115+
for functions implemented in assembly.
116+
117+
The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the
118+
forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch
119+
to check if its destination is permitted. While the overhead of this approach can be reduced by
120+
using efficient data structures for the destination address lookup and optionally limiting the
121+
checks only to indirect function calls, it is still significantly larger than the worst-case BTI
122+
overhead of one instruction per basic block per function. On the other hand, it does not require any
123+
special hardware support, so it could be applied to all supported platforms.
124+
125+
[clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
126+
127+
# Open questions
128+
[open-questions]: #open-questions
129+
130+
- What is the performance overhead of the proposal?
131+
- What technologies are available in other instruction set architectures to achieve the same goals?
132+
- What hardening approaches are applicable to the fiber implementation? The fiber switching code
133+
saves the values of all callee-saved registers on the stack, i.e. memory that is potentially
134+
accessible to an attacker. Some of those values could be code addresses that would be used by
135+
indirect branches, so should we devise a scheme to authenticate them? While the regular pointer
136+
authentication instructions assume that they are operating on valid virtual addresses (which implies
137+
that the most significant bits are redundant and could be repurposed), PAuth provides operations to
138+
authenticate arbitrary data, which could be used in this case.
139+
- Should we generate the operations that act as `NOP` instructions unconditionally instead (while
140+
still choosing the shorter alternative sequences if the target supports them)? That would
141+
especially help the ahead of time compilation use case, and could arguably reduce the amount of
142+
testing, i.e. no need to check both with and without CFI enhancements.

0 commit comments

Comments
 (0)