|
| 1 | +# Zephyr with the Xtensa CALL0 ABI |
| 2 | + |
| 3 | +The Xtensa register window mechanism is a poor fit for memory |
| 4 | +protection. The existing Zephyr optimizations, in particular the |
| 5 | +cross-stack call we do on interrupt entry, can't be used. The way |
| 6 | +Xtensa windows spill is to write register state through the stack |
| 7 | +pointers it finds in the hidden caller registers, which are wholely |
| 8 | +under the control of the userspace app that was interrupted. The |
| 9 | +kernel can't be allowed to do that, it's a huge security hole. |
| 10 | + |
| 11 | +Naively, the kernel would have to write out the entire 64-entry |
| 12 | +register set on every entry from a context with a PS.RING value other |
| 13 | +than zero, including system calls. That's not really an acceptable |
| 14 | +performance or complexity cost. |
| 15 | + |
| 16 | +Instead, for userspace apps, Zephyr builds using the "call0" ABI, |
| 17 | +where only a fixed set of 16 GPRs (see section 10 of the Cadence |
| 18 | +Xtensa Instruction Set Architecture Summary for details on the ABI). |
| 19 | +Kernel traps can then use the remaining GPRs for their own purposes, |
| 20 | +greatly speeding up entry speed. |
| 21 | + |
| 22 | +## Toolchain |
| 23 | + |
| 24 | +Existing Xtensa toolchains support a ``-mabi=call0`` flag to generate |
| 25 | +code with this ABI, and it works as expected. It sets a |
| 26 | +__XTENSA_CALL0_ABI__ preprocessor flag and the Xtensa HAL code and our |
| 27 | +crt1.S file are set up to honor it appropriately (though there's a |
| 28 | +glitch in the Zephyr HAL integration, see below). |
| 29 | + |
| 30 | +Unfortunately that doesn't extend to binary artifacts. In particular |
| 31 | +the libgcc.a files generated for our existing SDK toolchains are using |
| 32 | +the windowed ABI and will fail at runtime when you hit 64 bit math. |
| 33 | + |
| 34 | +Cadence toolchains have an automatic multilib scheme and will select a |
| 35 | +compatible libgcc automatically. |
| 36 | + |
| 37 | +But for now, you have to have a separate toolchain (or at least a |
| 38 | +separately-built libgcc) for building call0 apps. I'm using this |
| 39 | +script and it works fine, pending proper SDK integration: |
| 40 | + |
| 41 | +.. code-block:: bash |
| 42 | + #!/bin/sh |
| 43 | + set -ex |
| 44 | +
|
| 45 | + TC=$1 |
| 46 | + if [ -z "$TC" ]; then |
| 47 | + TC=dc233c |
| 48 | + fi |
| 49 | +
|
| 50 | + # Grab source (these are small) |
| 51 | + git clone https://github.com/zephyrproject-rtos/sdk-ng |
| 52 | + git clone https://github.com/crosstool-ng/crosstool-ng |
| 53 | +
|
| 54 | + # Build ct-ng itself |
| 55 | + cd crosstool-ng |
| 56 | + ./bootstrap |
| 57 | + ./configure --enable-local |
| 58 | + make -j$(nproc) |
| 59 | +
|
| 60 | + # Configure for the desired toolchain |
| 61 | + ln -s ../sdk-ng/overlays |
| 62 | + cp ../sdk-ng/configs/xtensa-${TC}_zephyr-elf.config .config |
| 63 | +
|
| 64 | + grep -v CT_TARGET_CFLAGS .config > asdf |
| 65 | + mv asdf .config |
| 66 | + echo CT_TARGET_CFLAGS="-mabi=call0" >> .config |
| 67 | + echo CT_LIBC_NONE=y >> .config |
| 68 | +
|
| 69 | + ./ct-ng olddefconfig |
| 70 | + ./ct-ng build.$(nproc) |
| 71 | +
|
| 72 | + echo "##" |
| 73 | + echo "## Set these values to enable this toolchain:" |
| 74 | + echo "##" |
| 75 | + echo export CROSS_COMPILE=$HOME/x-tools/xtensa-${TC}_zephyr-elf/bin/xtensa-${TC}_zephyr-elf- |
| 76 | + echo export ZEPHYR_TOOLCHAIN_VARIANT=cross-compile |
| 77 | +
|
| 78 | +Note you don't really need to use the toolchain that was built, it's |
| 79 | +enough to take the libgcc.a and drop it on top of the one in your SDK, |
| 80 | +just be careful to save the original, etc... |
| 81 | + |
| 82 | +## Bugs and Missing Features |
| 83 | + |
| 84 | +No support in the SDK. See above about toolchains. |
| 85 | + |
| 86 | +For simplicity, this code is written to save context to the arch |
| 87 | +region of the thread struct and not the stack (except for nested |
| 88 | +interrupts, obviously). That's a common pattern in Zephyr, but for |
| 89 | +KERNEL_COHERENCE (SMP) platforms it's actually a performance headache, |
| 90 | +because the stack is cached where the thread struct is not. We should |
| 91 | +move this back to the stack, but that requires doing some logic in the |
| 92 | +assembly to check that the resulting stack pointer doesn't overflow |
| 93 | +the protected region, which is slightly non-trivial (or at least needs |
| 94 | +a little inspiration). |
| 95 | + |
| 96 | +Right now the Zephyr HAL integration doesn't build when the call0 flag |
| 97 | +is set because of a duplicated symbol. I just disabled the build at |
| 98 | +the cmake level for now, but this needs to be figured out. |
| 99 | + |
| 100 | +The ARCH_EXCEPT() handling is a stub and doesn't actually do anything. |
| 101 | +Really this isn`t any different than the existing code, it just lives |
| 102 | +in the asm2 files that were disabled and I have to find a shared |
| 103 | +location for it. |
| 104 | + |
| 105 | +Backtrace logging is likewise specific to the older frame format and |
| 106 | +ABI and doesn't work. The call0 ABI is actually much simpler, though, |
| 107 | +so this shouldn't be hard to make work. |
| 108 | + |
| 109 | +FPU support isn't enabled. Likewise this isn't any different (except |
| 110 | +trivially -- the context struct has a different layout). We just need |
| 111 | +to copy code and find compatible hardware to test it on. |
| 112 | + |
| 113 | +I realized when writing this that our existing handling for NMI |
| 114 | +exceptions (strictly any level > EXCM_LEVEL) isn't correct, as we |
| 115 | +generate ZSR-style EPS/EPC register usage in those handlers that isn't |
| 116 | +strictly separated from the code it may/will have interrupted. This |
| 117 | +is actually a bug with asm2 also, but it's going to make |
| 118 | +high-priority/DEBUG/NMI exceptions unreliable unless fixed (I don't |
| 119 | +think there are any Zephyr users currently though). |
0 commit comments