Skip to content

[RFC] Intel CET support #91024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

edersondisouza
Copy link
Contributor

Initial support for Control-flow Enforcement Technology (CET). Both features - Indirect Branch Tracking (IBT) and Shadow Stack are supported, for both x86 and x86_64.

The IBT support is quite straightforward: code that enables it on Zephyr, flags for the compiler, tests and some endbr32 (or endbr64) sprinkled in some asm code. The main issue here is the need of support by the toolchain - not only the compiler, but bits that may be included in the final ELF file, such as libc or libgcc, need to have been compiled with the appropriate flags. Toolchain work is coming, but this PR doesn't depend on it.

On the other hand, shadow stack support is much more interesting. In this PR, patches slowly add support for it. First, x86 support is added, followed by a generic interface - loosely named "Hardware Shadow Stack", so that the threads created by Zephyr itself can also have a shadow stack, and finally "automatic" support, where one only needs to enable shadow stack in Kconfig to have it working. This order seems more natural to review, but it also keeps more changes than strictly needed (as some things are created in a x86 specific way and then converted to the generic approach), let me know if this is an issue.

This is done so that there are less friction to use the shadow stacks: instead of defining a shadow stack every time a normal stack is defined, Zephyr will do that under the hood. Some tunable configs allow one to define how big the shadow stack should be, as a fraction of ordinary stacks (and a minimum shadow stack size). Note that while calculated from the ordinary thread stack size, the shadow stack size is in addition to it, so that ordinary stacks don't suddenly get smaller.

The x86 shadow stacks need some initialization. This PR focuses on statically created threads (and their stacks), so it also focuses on statically initializing their shadow stacks. This usually means some C macros are used for such initialization. However, stack arrays can't be initialized this way, as some arrays are defined using expressions (such as CONFIG_MP_MAX_NUM_CPUS - 1) that don't play along macro helpers such as LISTIFY. For this case, a scripted approach to initialize the stack arrays is used.

While this patch is about statically created threads, dynamic ones using pool-based allocation should work.

Note that this PR doesn't deal with userspace support.

Testing

There's no straightforward way to test this work, unfortunately. While current Zephyr toolchains can add the "landing pads" for IBT, their own bits (such as libgcc) won't have it. One can workaround those by building the libc (for instance, by using CONFIG_PICOLIBC_USE_MODULE=y) and not using other toolchain specific bits. But biggest issue is how to test shadow stack support.

While there are patches out there to add support for CET on Qemu and KVM, they are not upstream yet. I have branches for both qemu and kvm. Note that I rebased the patches on both trees, so things could be wrong, but they do seem to work. Then, one would only need this patch that adds a qemu x86_64 board with CET support.

Please review - as always, comments, suggestions and critiques are welcome! Specially interested in the automatic and generic support for shadow stacks. It is hopefully generic enough for different shadow stack schemes, but let me know!

@fabiobaltieri
Copy link
Member

fabiobaltieri commented Jun 4, 2025

Hey, this needs a rebase and force push to pick up the CI error workaround.

Some SoCs may need to do some preparatory work before changing the
current shadow stack pointer (and thus, currently used shadow stack).
This patch adds a way for that, shielded by a Kconfig
(CONFIG_X86_CET_SOC_PREPARE_SHADOW_STACK_SWITCH).

As currently only 32 bit SoC may use this, support is only added to the
32 bit code.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
In order to allow kernel created threads (such as main and idle threads)
to make use of hardware shadow stack implementation, add an interface
for them.

This patch basically provides an infra that architectures need to
implement to provide hardware shadow stack.

Also, main and idle threads are updated to make use of this interface
(if hardware shadow stacks are enabled).

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
So that kernel created threads can use shadow stacks. Note that
CONFIG_X86_CET_SHADOW_STACK is abandoned in favour of
CONFIG_HW_SHADOW_STACK.

This means change some types, functions and macro throughout shadow
stack code.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
It seems that, at least on tests, it's common to call k_thread_create()
on a thread multiple times. This trips a check for the CET shadow stack
- namely, set a shadow stack on a thread which already has a shadow
stack.

This patch adds a Kconfig option to allow that, iff the base address and
size of the new shadow stack are the same as before. This will trigger a
reset of the shadow stack, so it can be reused.

It may be the case that this behaviour (reusing threads) is more common
than only for tests, in which case it could make sense to change the
default - in this patch, is only true if ZTEST.

Even if being enabled by default becomes the reality, it would still
make sense to keep this option - more conscious apps could avoid the
need for the shadow stack reset code altogether.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Test could trip shadow stack protections instead of normal stack
sentinel, thus requiring special handling for this case. Just avoid this
test instead, if CONFIG_HW_SHADOW_STACK=y.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
New cmake list, `post_build_patch_elf_commands`, will be prepended to
`post_build_commands` one, effectively making these commands to run just
after the ELF is created. It's particularly useful for operations that
patch the ELF generated, before other representations of it (such as
.bin) are created.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
This patch modifies thread stack macros (such as K_KERNEL_STACK_DECLARE
or K_KERNEL_STACK_ARRAY_DECLARE) to also create a HW shadow stack (when
CONFIG_HW_SHADOW_STACK=y), as well as define a pairing between the
thread stack (or thread stack array) and the shadow stack (or shadow
stack array).

This pairing, which currently is simply an array of pairs (stack,
shadow_stack) is searched during thread setup to find the corresponding
shadow stack and attach it to the thread. If linear search on this array
proves to be a performance issue, the actual structure can be revisited.

To define the size of the shadow stack for a given stack, the stack size
is used. A new Kconfig, CONFIG_HW_SHADOW_STACK_PERCENTAGE_SIZE is used
to define how big the shadow stack is compared to the stack. Note that
this size is in *addition* to the stack size. To avoid some shadow
stacks becoming too small, CONFIG_HW_SHADOW_STACK_MIN_SIZE is used to
define a minimum size. Note that after this size is defined, platform
restrictions on the size of the shadow stack are still applied.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
   - No more need for special IRQ shadow stacks - just reuse the one
   created for z_interrupt_stacks;
   - Add the linker sections for the pairs of stack/shadow stack;
   - Support shadow stack arrays.

Last item was a bit challenging: shadow stacks need to be initialised
before use, and this is done statically for normal shadow stacks. To
initialise the shadow stacks in the array, one needs how many entries it
has. While a simple approach would use `LISTIFY` to them do the
initialization on all entries, that is not possible as many stack arrays
are created using expressions instead of literals, such as
`CONFIG_MP_MAX_NUM_CPUS - 1`, which won't work with `LISTIFY`.

Instead, this patch uses a script, `gen_static_shstk_array.py` that
gathers all needed information and patches the ELF to initialize the
stack arrays. Note that this needs to be done before any other operation
on the ELF file that creates new representations, such as the .bin
output.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
A limitation on current HW shadow stack implementation for x86 is that
more than one thread stack can't share the same variable name, even if
they are static. To avoid skipping useful tests, for now, while this
limitation holds, change the name of the stack variable.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
A limitation on current HW shadow stack implementation for x86 is that
more than one thread stack can't share the same variable name, even if
they are static. To avoid skipping useful tests, for now, while this
limitation holds, change the name of the stack variable.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Ensure they behave as expected.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
@edersondisouza
Copy link
Contributor Author

v4:

  • cmake style fixes;
  • Improve Kconfig description;
  • Rebased.

@edersondisouza
Copy link
Contributor Author

Also not sure if it's by intention or not but nothing in-tree uses this so twister is filtering all boards on the tests and they're not running

Yeah, testing it is a bit involved right now - needs kvm and qemu patches - as described in the description of this PR.

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants