Skip to content

Commit 4954981

Browse files
committed
arch/xtensa: Add a call0 ABI variant
New context layer for building Zephyr with the call0 ABI. The system call overhead of having to spill 64 GPRs, and the general complexity of the existing asm2 layer, is prohibitive when trying to build a userspace kernel entry path for Xtensa. Note: it still uses the windowed registers, but as a quick-switch register set for optimized interrupt entry. This is not a mechanism for building Zephyr on hardware that actually lacks windows (though with some work it could be). This is much (much) leaner in terms of entry/exit overhead, is significantly smaller, and should scale better with changes needed for userspace. There is a cost in terms of function call code size and execution speed, but in practice it seems manageable (I'm measuring ~4% with a no-frame-pointer build, Max Filippov reports more like ~13% in Linux builds -- both numbers are well under the kind of overhead we see enabling userspace at all). Signed-off-by: Andy Ross <andyross@google.com>
1 parent 058f21d commit 4954981

File tree

8 files changed

+871
-8
lines changed

8 files changed

+871
-8
lines changed

arch/xtensa/core/CMakeLists.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,15 @@ zephyr_library_sources(
88
cpu_idle.c
99
fatal.c
1010
window_vectors.S
11-
xtensa-asm2-util.S
12-
xtensa-asm2.c
1311
irq_manage.c
1412
)
1513

14+
if(CONFIG_XTENSA_CALL0_ABI)
15+
zephyr_library_sources(xtensa-win0.S xtensa-win0.c)
16+
else()
17+
zephyr_library_sources(xtensa-asm2-util.S xtensa-asm2.c)
18+
endif()
19+
1620
zephyr_library_sources_ifdef(CONFIG_XTENSA_USE_CORE_CRT1 crt1.S)
1721
zephyr_library_sources_ifdef(CONFIG_IRQ_OFFLOAD irq_offload.c)
1822
zephyr_library_sources_ifdef(CONFIG_THREAD_LOCAL_STORAGE tls.c)

arch/xtensa/core/README-CALL0.rst

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Zephyr with the Xtensa CALL0 ABI
2+
3+
The Xtensa register window mechanism is a poor fit for memory
4+
protection. The existing Zephyr optimizations, in particular the
5+
cross-stack call we do on interrupt entry, can't be used. The way
6+
Xtensa windows spill is to write register state through the stack
7+
pointers it finds in the hidden caller registers, which are wholely
8+
under the control of the userspace app that was interrupted. The
9+
kernel can't be allowed to do that, it's a huge security hole.
10+
11+
Naively, the kernel would have to write out the entire 64-entry
12+
register set on every entry from a context with a PS.RING value other
13+
than zero, including system calls. That's not really an acceptable
14+
performance or complexity cost.
15+
16+
Instead, for userspace apps, Zephyr builds using the "call0" ABI,
17+
where only a fixed set of 16 GPRs (see section 10 of the Cadence
18+
Xtensa Instruction Set Architecture Summary for details on the ABI).
19+
Kernel traps can then use the remaining GPRs for their own purposes,
20+
greatly speeding up entry speed.
21+
22+
## Toolchain
23+
24+
Existing Xtensa toolchains support a ``-mabi=call0`` flag to generate
25+
code with this ABI, and it works as expected. It sets a
26+
__XTENSA_CALL0_ABI__ preprocessor flag and the Xtensa HAL code and our
27+
crt1.S file are set up to honor it appropriately (though there's a
28+
glitch in the Zephyr HAL integration, see below).
29+
30+
Unfortunately that doesn't extend to binary artifacts. In particular
31+
the libgcc.a files generated for our existing SDK toolchains are using
32+
the windowed ABI and will fail at runtime when you hit 64 bit math.
33+
34+
Cadence toolchains have an automatic multilib scheme and will select a
35+
compatible libgcc automatically.
36+
37+
But for now, you have to have a separate toolchain (or at least a
38+
separately-built libgcc) for building call0 apps. I'm using this
39+
script and it works fine, pending proper SDK integration:
40+
41+
.. code-block:: bash
42+
#!/bin/sh
43+
set -ex
44+
45+
TC=$1
46+
if [ -z "$TC" ]; then
47+
TC=dc233c
48+
fi
49+
50+
# Grab source (these are small)
51+
git clone https://github.com/zephyrproject-rtos/sdk-ng
52+
git clone https://github.com/crosstool-ng/crosstool-ng
53+
54+
# Build ct-ng itself
55+
cd crosstool-ng
56+
./bootstrap
57+
./configure --enable-local
58+
make -j$(nproc)
59+
60+
# Configure for the desired toolchain
61+
ln -s ../sdk-ng/overlays
62+
cp ../sdk-ng/configs/xtensa-${TC}_zephyr-elf.config .config
63+
64+
grep -v CT_TARGET_CFLAGS .config > asdf
65+
mv asdf .config
66+
echo CT_TARGET_CFLAGS="-mabi=call0" >> .config
67+
echo CT_LIBC_NONE=y >> .config
68+
69+
./ct-ng olddefconfig
70+
./ct-ng build.$(nproc)
71+
72+
echo "##"
73+
echo "## Set these values to enable this toolchain:"
74+
echo "##"
75+
echo export CROSS_COMPILE=$HOME/x-tools/xtensa-${TC}_zephyr-elf/bin/xtensa-${TC}_zephyr-elf-
76+
echo export ZEPHYR_TOOLCHAIN_VARIANT=cross-compile
77+
78+
Note you don't really need to use the toolchain that was built, it's
79+
enough to take the libgcc.a and drop it on top of the one in your SDK,
80+
just be careful to save the original, etc...
81+
82+
## Bugs and Missing Features
83+
84+
No support in the SDK. See above about toolchains.
85+
86+
For simplicity, this code is written to save context to the arch
87+
region of the thread struct and not the stack (except for nested
88+
interrupts, obviously). That's a common pattern in Zephyr, but for
89+
KERNEL_COHERENCE (SMP) platforms it's actually a performance headache,
90+
because the stack is cached where the thread struct is not. We should
91+
move this back to the stack, but that requires doing some logic in the
92+
assembly to check that the resulting stack pointer doesn't overflow
93+
the protected region, which is slightly non-trivial (or at least needs
94+
a little inspiration).
95+
96+
Right now the Zephyr HAL integration doesn't build when the call0 flag
97+
is set because of a duplicated symbol. I just disabled the build at
98+
the cmake level for now, but this needs to be figured out.
99+
100+
The ARCH_EXCEPT() handling is a stub and doesn't actually do anything.
101+
Really this isn`t any different than the existing code, it just lives
102+
in the asm2 files that were disabled and I have to find a shared
103+
location for it.
104+
105+
Backtrace logging is likewise specific to the older frame format and
106+
ABI and doesn't work. The call0 ABI is actually much simpler, though,
107+
so this shouldn't be hard to make work.
108+
109+
FPU support isn't enabled. Likewise this isn't any different (except
110+
trivially -- the context struct has a different layout). We just need
111+
to copy code and find compatible hardware to test it on.
112+
113+
I realized when writing this that our existing handling for NMI
114+
exceptions (strictly any level > EXCM_LEVEL) isn't correct, as we
115+
generate ZSR-style EPS/EPC register usage in those handlers that isn't
116+
strictly separated from the code it may/will have interrupted. This
117+
is actually a bug with asm2 also, but it's going to make
118+
high-priority/DEBUG/NMI exceptions unreliable unless fixed (I don't
119+
think there are any Zephyr users currently though).

arch/xtensa/core/offsets/offsets.c

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include <kernel_offsets.h>
88

99
#include <xtensa-asm2-context.h>
10+
#include <zephyr/arch/xtensa/xtensa-win0.h>
1011

1112
GEN_ABSOLUTE_SYM(___xtensa_irq_bsa_t_SIZEOF, sizeof(_xtensa_irq_bsa_t));
1213
GEN_ABSOLUTE_SYM(___xtensa_irq_stack_frame_raw_t_SIZEOF, sizeof(_xtensa_irq_stack_frame_raw_t));
@@ -60,4 +61,49 @@ GEN_OFFSET_SYM(_xtensa_irq_bsa_t, fpu14);
6061
GEN_OFFSET_SYM(_xtensa_irq_bsa_t, fpu15);
6162
#endif
6263

64+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a0);
65+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a1);
66+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a2);
67+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a3);
68+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a4);
69+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a5);
70+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a6);
71+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a7);
72+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a8);
73+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a9);
74+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a10);
75+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a11);
76+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a12);
77+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a13);
78+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a14);
79+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, a15);
80+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, ps);
81+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, pc);
82+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, sar);
83+
#if XCHAL_HAVE_LOOPS
84+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, lcount);
85+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, lend);
86+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, lbeg);
87+
#endif
88+
#if XCHAL_HAVE_S32C1I
89+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, scompare1);
90+
#endif
91+
#if XCHAL_HAVE_THREADPTR
92+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, threadptr);
93+
#endif
94+
#ifdef CONFIG_FPU_SHARING
95+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, fcr);
96+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, fsr);
97+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, fregs[16]);
98+
#endif
99+
/* #ifdef CONFIG_USERSPACE */
100+
/* Always enabled currently to support a syscall mocking layer */
101+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, user_a0);
102+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, user_a1);
103+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, user_pc);
104+
GEN_OFFSET_SYM(xtensa_win0_ctx_t, user_ps);
105+
/* #endif */
106+
107+
GEN_ABSOLUTE_SYM(__xtensa_win0_ctx_t_SIZEOF, sizeof(xtensa_win0_ctx_t));
108+
63109
GEN_ABS_SYM_END

0 commit comments

Comments
 (0)