Skip to content

Conversation

@mihalicyn
Copy link
Member

During our work on arm64 GCS (aka shadow stack) we found (thanks to sysctl -w kernel.randomize_va_space=0) that the way how we restore x86 shadow stack is not quite correct. I need to dive into details to explain why.

During shadow stack restore we abusing so called "premapped" VMAs. It is purely CRIU concept, and we use it to ensure proper CoW restoration for anonymous VMAs. Idea is simple, early on restore we prepare VMAs and fill them with a data from page images. Then, when we start forking processes they can naturally inherit those VMAs and CoW flags will be properly set in the kernel. Of course, this has nothing to do with shadow stack that's why I said "abusing" instead of using. :)

So, basically, for shadow stack we were "premapping" (and, important, NON-shadow stack, but regular anon) VMA with the size original_shadow_stack_size + 1 page. And intention was simple, we were putting original shadow stack contents to it and one extra page was basically a place holder for a temporary shadow stack we need for restorer PIE to function properly. Then, once we reach vma_remap(vma_entry, args->uffd) loops, we were just skipping those premapped VMAs because at that time we still need them as we restore shadow stack after that loop, but this is the problem. We can't just skip mappings in that loops (

if (vma_remap(vma_entry, args->uffd))
and
if (vma_remap(vma_entry, args->uffd))
) because the entire algorithm only functions if VMAs are remapped in the proper order.

Igor (@svilenkov) noticed, that something fails there on his testing system all the time. This was a hint for us to recheck x86 shadow stack too, but it wasn't failing. Missing piece of puzzle was sysctl -w kernel.randomize_va_space=0 which Igor was using on his testing VM. Enabling this on my x86 machine immediately made x86 shadow stack to fail too. Like this:

# ./test/zdtm.py run -t zdtm/static/pthread00 -f h
userns is supported
=== Run 1/1 ================ zdtm/static/pthread00
======================== Run zdtm/static/pthread00 in h ========================
Start test
./pthread00 --pidfile=pthread00.pid --outfile=pthread00.out
Run criu dump
Run criu restore
=[log]=> dump/zdtm/static/pthread00/62/1/restore.log
------------------------ grep Error ------------------------
b'(00.008718) pie: 62: Remap 0x7ffff495f000->0x7ffff59fe000 len 0x1000'
b'(00.008726) pie: 62: Remap 0x7ffff415f000->0x7ffff51fe000 len 0x800000'
b'(00.008744) pie: 62: Remap 0x7ffff415e000->0x7ffff51fd000 len 0x1000'
b'(00.008753) pie: 62: Remap 0x7fffef97e000->0x7ffff0021000 len 0x3fdf000'
b'(00.008758) pie: 62: Error (criu/pie/restorer.c:1153): Unable to map a guard page 0x7ffff3fff000 (0x7ffff7fba000)'
b'(00.008768) pie: 62: Error (criu/pie/restorer.c:2329): Restorer fail 62'
b'(00.008785) Error (criu/cr-restore.c:2324): Restoring FAILED.'
b'(00.008928) Error (criu/cr-restore.c:1258): 62 exited, status=1'
------------------------ ERROR OVER ------------------------
############### Test zdtm/static/pthread00 FAIL at CRIU restore ################

or like this:

# ./test/zdtm.py run -t zdtm/static/pthread00 -f h
userns is supported
=== Run 1/1 ================ zdtm/static/pthread00
======================== Run zdtm/static/pthread00 in h ========================
Start test
./pthread00 --pidfile=pthread00.pid --outfile=pthread00.out
Run criu dump
Run criu restore
=[log]=> dump/zdtm/static/pthread00/62/1/restore.log
------------------------ grep Error ------------------------
b'(00.008507) pie: 62: Restoring scheduler params 0.0.0'
b'(00.008517) pie: 62: rseq: rseq_abi_pointer = 0x7ffff7f9e060 signature = 0x53053053'
b'(00.008520) pie: 62: Using clone3 to restore the process'
b'(00.008544) pie: 62: Using clone3 to restore the process'
b'(00.008551) pie: 63: Error (criu/arch/x86/include/asm/shstk.h:89): Failed to map shadow stack at 0x7ffff6200000: -17'
b'(00.008554) pie: 63: Error (criu/pie/restorer.c:830): Restorer abnormal termination for 62'
b'(00.008583) Error (criu/cr-restore.c:2324): Restoring FAILED.'
b'(00.008702) Error (criu/cr-restore.c:1258): 62 exited, status=1'
------------------------ ERROR OVER ------------------------
############### Test zdtm/static/pthread00 FAIL at CRIU restore ################

The idea of the right fix belongs to Andrei. We have to restore shadow stack VMA right in vma remap loop. Once it is our turn, we do:

  1. create shadow stack vma in a random place
  2. fill it with data from a premapped VMA
  3. unmap premapped VMA
  4. mark shadow stack VMA address as a "premapped" one
  5. let vma_remap function to do its job to properly mremap it to the final destination

Fixes: #2306

svilenkov and others added 2 commits October 16, 2025 22:43
* shstk_restorer_stack_size() – restorer shadow stack size
* shstk_set_restorer_stack() – set restorer shadow stack start

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* shstk_restorer_stack_size(): PAGE_SIZE
* shstk_set_restorer_stack(): set restorer temporary shadow stack start

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@mihalicyn mihalicyn force-pushed the x86/shstk branch 2 times, most recently from 61e8a76 to 4e0013b Compare October 16, 2025 20:53
@mihalicyn mihalicyn requested review from avagin and rppt October 16, 2025 20:55
@avagin
Copy link
Member

avagin commented Oct 17, 2025

LGTM. Thanks a lot for taking this.

Copy link
Member

@rppt rppt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Mike Rapoport rppt@kernel.org

@mihalicyn
Copy link
Member Author

Acked-by: Mike Rapoport rppt@kernel.org

Thank you! ;)

mihalicyn and others added 5 commits October 17, 2025 18:53
* default: return whatever passed in
  eg. to be used as
     shtk_min_mmap_addr(kdat.mmap_min_addr)
* x86: ignore def and return 4G

On x86, CET shadow stack is required to be mapped above 4GiB
On the other hand forcing 4GiB globally would break 32-bit restores.

Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* reserve space for restorer shadow stack
* set tmp_shstk at mem, advance mem by PAGE_SIZE
* forget the extra PAGE_SIZE (shstk) for premapped VMAs

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
1. create shadow stack vma during vma_remap cycle
2. copy contents from a premapped non-shstk VMA into it
3. unmap premapped non-shstk VMA
4. Mark shstk VMA for remap into the final destination

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Co-Authored-By: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
[ alex: debugging, rework together with Andrei and code cleanup ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* call shstk_vma_restore() for VMA_AREA_SHSTK in vma_remap()
* delete map/copy/unmap from shstk_restore() and keep token setup + finalize
* before the loop naturally stopped at cet->ssp-8, so a -8 nudge is required here

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* add SHSTK_ENABLE=1 toggle
* passes -mshstk to compiler and -z shstk to linker

Example:
  $ make -C test/zdtm/static clean
  $ make -C test/zdtm/static V=1 SHSTK_ENABLE=1 env00

  $ readelf --notes test/zdtm/static/env00 | grep SHSTK
      Properties: x86 feature: SHSTK

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@svilenkov
Copy link
Member

LGTM. Thanks!

@avagin avagin merged commit 7e70948 into checkpoint-restore:criu-dev Oct 17, 2025
37 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants