|
| 1 | +# Summary |
| 2 | +[summary]: #summary |
| 3 | + |
| 4 | +This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two |
| 5 | +technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target |
| 6 | +Identification. |
| 7 | + |
| 8 | +# Motivation |
| 9 | +[motivation]: #motivation |
| 10 | + |
| 11 | +The [security model of WebAssembly](https://webassembly.org/docs/security/) ensures that Wasm |
| 12 | +modules execute in a sandboxed environment isolated from the host runtime. One aspect of that model |
| 13 | +is that it provides implicit control flow integrity (CFI) by forcing all function call targets to |
| 14 | +specify a valid entry in the function index space, by using a protected call stack that is not |
| 15 | +affected by buffer overflows in the module heap, and so on. As a result, in some Wasm applications |
| 16 | +the runtime is able to execute untrusted code safely. However, that places the burden of ensuring |
| 17 | +that the security properties are upheld on the compiler to a large extent. |
| 18 | + |
| 19 | +On the other hand, a further aspect of the WebAssembly design is efficient execution (close to |
| 20 | +native speed), which leads to a natural tendency towards sophisticated optimizing compilers. |
| 21 | +Unfortunately, the additional complexity increases the risk of implementation problems and in |
| 22 | +particular compromises of the security properties. For example, Cranelift has been affected by |
| 23 | +issues such as CVE-2021-32629 [cve] that could make it possible to access the protected call stack |
| 24 | +or memory that is private to the host runtime. |
| 25 | + |
| 26 | +We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as |
| 27 | +expanding fuzzing and making it possible to apply formal verification to at least some parts of the |
| 28 | +compilation process. However, it is also reasonable to consider a defense in depth strategy and to |
| 29 | +evaluate mitigations for potential future issues. |
| 30 | + |
| 31 | +Finally, Wasmtime can be used as a library and in particular embedded into an application that is |
| 32 | +implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that |
| 33 | +case the compiled WebAssembly code could provide convenient instruction sequences for attacks that |
| 34 | +subvert normal control flow and that originate from the embedder's code, even if Cranelift and |
| 35 | +Wasmtime themselves lack any defects. |
| 36 | + |
| 37 | +[cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5 |
| 38 | + |
| 39 | +# Proposal |
| 40 | +[proposal]: #proposal |
| 41 | + |
| 42 | +Currently this proposal focuses on the AArch64 execution environment. |
| 43 | + |
| 44 | +## Background |
| 45 | + |
| 46 | +The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e. |
| 47 | +provides back-edge CFI. It is described in section D5.1.5 of |
| 48 | +[the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP` |
| 49 | +instructions when executed by a processor that does not support the extension. |
| 50 | + |
| 51 | +The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is |
| 52 | +provides forward-edge CFI and is described in section D5.4.4. A processor implementation with BTI |
| 53 | +would support PAuth as well, but not necessarily vice versa. Whether BTI applies to an executable |
| 54 | +memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad" |
| 55 | +for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for |
| 56 | +processors that do not support BTI). |
| 57 | + |
| 58 | +Both extensions are applicable only to the AArch64 execution state and are optional, so each CFI |
| 59 | +technique would be employed only if the target environment provides the necessary ISA support. |
| 60 | +Wasmtime embedders need to consider a subtlety - if they cache the result of the check, that may |
| 61 | +happen to be located in memory that could be potentially accessible to an attacker, so the latter |
| 62 | +could disable the use of PAuth and BTI in subsequent code generation. Mitigating this issue is |
| 63 | +outside the scope of this proposal. |
| 64 | + |
| 65 | +The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] provides an introduction |
| 66 | +to the technologies. |
| 67 | + |
| 68 | +[arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en |
| 69 | +[code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story |
| 70 | + |
| 71 | +## Improved back-edge CFI with PAuth |
| 72 | + |
| 73 | +The proposed implementation will add the `PACIASP` instruction to the beginning of every function |
| 74 | +compiled by Cranelift and would replace the final return with the `RETAA` instruction. |
| 75 | + |
| 76 | +In environments that use the DWARF format for unwinding the implementation would be modified to |
| 77 | +apply the `DW_CFA_AARCH64_negate_ra_state` operation immediately after the `PACIASP` instruction. |
| 78 | + |
| 79 | +These steps can be skipped for simple leaf functions that do not construct frame records on the |
| 80 | +stack. |
| 81 | + |
| 82 | +## Enhanced forward-edge CFI with BTI |
| 83 | + |
| 84 | +The proposed implementation will add the `BTI j` instruction to the beginning of every basic block |
| 85 | +that is the target of an indirect branch and that is not a function prologue. Note that in the |
| 86 | +AArch64 backend generated function calls always target function prologues and indirect branches that |
| 87 | +do not act like function calls appear only in the implementation of the `br_table` IR operation. |
| 88 | +Function prologues would be covered by the pointer authentication instructions, which also act as |
| 89 | +landing pads - as discussed before, BTI support implies Pauth. |
| 90 | + |
| 91 | +During development one simple way to create a working prototype is to add the landing pads to the |
| 92 | +beginning of every basic block, irrespective of whether it is the target of an indirect branch or |
| 93 | +not. In this way it can be checked if BTI causes any issue with the rest of the runtime. |
| 94 | + |
| 95 | +## CFI improvements to assembly, C, C++, and Rust code |
| 96 | + |
| 97 | +Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of |
| 98 | +this proposal, but in general it should be achievable by passing the appropriate parameters to the |
| 99 | +respective compiler. |
| 100 | + |
| 101 | +Functions implemented in assembly will get a similar treatment as generated code, i.e. they will |
| 102 | +start with the `PACIASP` instruction. However, the regular return will be preserved and instead will |
| 103 | +be preceded by the `AUTIASP` instruction. The reason is that both `AUTIASP` and `PACIASP` act as |
| 104 | +`NOP` instructions when executed by a processor that does not support PAuth, thus making the |
| 105 | +assembly code generic. |
| 106 | + |
| 107 | +# Rationale and alternatives |
| 108 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 109 | + |
| 110 | +Since the existing implementation already uses the standard back-edge CFI techniques that are |
| 111 | +preferred in the absence of special hardware support (i.e. a separate protected stack that is not |
| 112 | +used for buffers that could be accessed out of bounds), the alternative is not to implement the |
| 113 | +proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code |
| 114 | +size the impact of the back-edge CFI improvements is an additional instruction per function, or 2 |
| 115 | +for functions implemented in assembly. |
| 116 | + |
| 117 | +The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the |
| 118 | +forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch |
| 119 | +to check if its destination is permitted. While the overhead of this approach can be reduced by |
| 120 | +using efficient data structures for the destination address lookup and optionally limiting the |
| 121 | +checks only to indirect function calls, it is still significantly larger than the worst-case BTI |
| 122 | +overhead of one instruction per basic block per function. On the other hand, it does not require any |
| 123 | +special hardware support, so it could be applied to all supported platforms. |
| 124 | + |
| 125 | +[clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html |
| 126 | + |
| 127 | +# Open questions |
| 128 | +[open-questions]: #open-questions |
| 129 | + |
| 130 | +- What is the performance overhead of the proposal? |
| 131 | +- What technologies are available in other instruction set architectures to achieve the same goals? |
| 132 | +- What hardening approaches are applicable to the fiber implementation? The fiber switching code |
| 133 | +saves the values of all callee-saved registers on the stack, i.e. memory that is potentially |
| 134 | +accessible to an attacker. Some of those values could be code addresses that would be used by |
| 135 | +indirect branches, so should we devise a scheme to authenticate them? While the regular pointer |
| 136 | +authentication instructions assume that they are operating on valid virtual addresses (which implies |
| 137 | +that the most significant bits are redundant and could be repurposed), PAuth provides operations to |
| 138 | +authenticate arbitrary data, which could be used in this case. |
| 139 | +- Should we generate the operations that act as `NOP` instructions unconditionally instead (while |
| 140 | +still choosing the shorter alternative sequences if the target supports them)? That would |
| 141 | +especially help the ahead of time compilation use case, and could arguably reduce the amount of |
| 142 | +testing, i.e. no need to check both with and without CFI enhancements. |
0 commit comments