Skip to content

Conversation

@avagin
Copy link
Member

@avagin avagin commented Oct 21, 2025

No description provided.

@rst0git
Copy link
Member

rst0git commented Oct 21, 2025

@avagin Would be possible to include the patches from the following PRs in the release?

adrianreber and others added 10 commits November 2, 2025 07:42
Signed-off-by: Adrian Reber <areber@redhat.com>
This is highly confusing, and it seems that the ret variable
is not handled in the subsequent process.

Signed-off-by: Yuanhong Peng <yummypeng@linux.alibaba.com>
The stack test incorrectly assumed the page immediately
following the stack pointer could never be changed. This doesn't work,
because this page can be a part of another mapping.

This commit introduces a dedicated "stack redzone," a small guard region
directly after the stack. The stack test is modified to specifically
check for corruption within this redzone.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Thomas Gleixner introduced the new interface to create posix timers
with specifed timer IDs:
torvalds/linux@ec2d0c0

Previously, CRIU recreated timers by repeatedly creating and deleting
them until the desired ID was reached. This approach isn't fast,
especially for timers with large IDs. For example, restoring two timers
with IDs 1000000 and 2000000 took approximately 1.5 seconds.

The new `prctl()` based interface allows direct creation of timers with
specified IDs, reducing the restoration time to around 3 microseconds
for the same example.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
When handing errors for functions such as `ptrace()`, `pipe()`, and
`fork()` it would be better to use `pr_perror` instead of `pr_err`
as it would include a message describing the encountered error.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The `goto interrupt` label is unnecessary as the code directly
returns after `cuda_process_checkpoint_action()`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
On a RHEL 8 based system building CRIU fails with:

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
                                       NT_ARM_PACA_KEYS
criu/arch/aarch64/crtools.c:73:39: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
                                            NT_ARM_PACA_KEYS

This adds the missing define if it is undefined.

Signed-off-by: Adrian Reber <areber@redhat.com>
Currently we save FP regs before parasite code runs, and restore after
for --leave-running, --check-only, and in case of errors. In case of
errors the error may have happened before FP regs were saved, so we
should only restore them if they were actually saved.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
CRIU locks the network during restore in an "empty" network namespace.
However, "empty" in this context means CRIU isn't restoring the
namespace. This network namespace can be the same namespace where
processes have been dumped and so the network is already locked in it.

Fixes checkpoint-restore#2650

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Building CRIU package on Debian 11 aarch64 fails with

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:32:31: error: storage size of 'paca' isn't known
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c:33:31: error: storage size of 'pacg' isn't known
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:47:15: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
  if (hwcaps & HWCAP_PACA) {
               ^~~~~~~~~~
               HWCAP_FCMA
criu/arch/aarch64/crtools.c:47:15: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c:53:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:82:15: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
  if (hwcaps & HWCAP_PACG) {
               ^~~~~~~~~~
               HWCAP_AES
criu/arch/aarch64/crtools.c:88:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:33:31: error: unused variable 'pacg' [-Werror=unused-variable]
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:32:31: error: unused variable 'paca' [-Werror=unused-variable]
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:227:31: error: storage size of 'upaca' isn't known
  struct user_pac_address_keys upaca;
                               ^~~~~
criu/arch/aarch64/crtools.c:228:31: error: storage size of 'upacg' isn't known
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:241:18: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
   if (!(hwcaps & HWCAP_PACA)) {
                  ^~~~~~~~~~
                  HWCAP_FCMA
criu/arch/aarch64/crtools.c:255:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:268:18: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
   if (!(hwcaps & HWCAP_PACG)) {
                  ^~~~~~~~~~
                  HWCAP_AES
criu/arch/aarch64/crtools.c:275:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:233:6: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
  int ret;
      ^~~
criu/arch/aarch64/crtools.c:228:31: error: unused variable 'upacg' [-Werror=unused-variable]
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:227:31: error: unused variable 'upaca' [-Werror=unused-variable]
  struct user_pac_address_keys upaca;
                               ^~~~~
This patch adds the missing constants and structs if undefined.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Snorch and others added 11 commits November 2, 2025 07:48
Mount flags belong to mount and mount namespace of the Container, so we
should preserve them, as Container user will not expect mounts switching
between ro and rw over c/r.

Fixes: checkpoint-restore#2632

v5: fix both mount-v1 and mount-v2

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Add {'bind': 'path/to/bindmount'} zdtm descriptor option, so that in
test mount namespace a directory bindmount can be created before running
the test.

This is useful to leave test directory writable (e.g. for logs) while
the test makes root mount readonly. note: We create this bindmount early
so that all test files are opened on it initially and not on the below
mount. Will be used in mnt_ro_root test.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
It makes root mount readonly and checks that it is still readonly after
migration.

Make zdtm/static writable for logs via "bind" desc option.

v2: explain why we don't have explicit rw/ro flag check
v3: use new zdtm "bind" desc option

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
With Go version 1.24, ListenConfig now uses MPTCP by default [1].
Checkpoint/restore for this protocol is not currently supported
and adding support requires kernel changes that are not trivial
to implement. As a result, checkpointing of many containers that
run Go programs is likely to fail with the following error [2]:

(00.026522) Error (criu/sk-inet.c:130): inet: Unsupported proto 262 for socket 2f9bc5

This patch adds a message with suggested workaround for this problem.

[1] https://go.dev/doc/go1.24#netpkgnet
[2] checkpoint-restore#2655

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
In some cases, they might not work in virtual machines if the hypervisor
doesn't virtualize them. For example, they don't work in AMD SEV virtual
machines if the Debug Virtualization extension isn't supported or isn't
enabled in SEV_FEATURES.

Fixes checkpoint-restore#2658

Signed-off-by: Andrei Vagin <avagin@gmail.com>
In 0a7c5fd we swapped the BSD
implementation of strlcat and strlcpy in favor of our own replacement.

The checks and the predefined macros are not needed anymore.

Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Building CRIU on Ubuntu 20.04 fails with the following error:

criu/sk-inet.c: In function 'can_dump_ipproto':
criu/sk-inet.c:131:16: error: 'IPPROTO_MPTCP' undeclared (first use in this function); did you mean 'IPPROTO_MTP'?
  131 |   if (proto == IPPROTO_MPTCP)
      |                ^~~~~~~~~~~~~
      |                IPPROTO_MTP

Add definition for MPTCP to fix this error.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Currently, in the target process, device-related restore operations and
other restore operations almost run sequentially. When the target
process executes the corresponding CRIU hook functions, it can't perform
other restore operations.  However, for GPU applications, some device
restore operations have no logical dependencies on other common restore
operations and can be parallelized with other operations to speed up the
process.

Instead of launching a thread in child processes for parallelization,
this patch chooses to add a new hook, `POST_FORKING`, in the main CRIU
process to handle these restore operations. This is because the
restoration of memory state in the restore blob is one of the most
time-consuming parts of all restore logic. The main CRIU process can
easily parallelize these operations, whereas parallelizing in threads
within child processes is challenging.

- POST_FORKING

*POST_FORKING: Hook to enable the main CRIU process to perform some
restore operations of plugins.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Currently, when CRIU calls `cr_plugin_init`, `fdstore` is not
initialized. However, during the plugin restore procedure, there may be
some common file operations used in multiple hooks. This patch moves
`cr_plugin_init` after `fdstore_init`, allowing `cr_plugin_init` to use
`fdstore` to place these file operations.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Currently, parallel restore only focuses on the single-process
situation. Therefore, it needs an interface to know if there is only one
process to restore. This patch adds a `has_children` function in
`pstree.h` and replaces some existing implementations with this
function.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Igor Svilenkov Bozic and others added 27 commits November 5, 2025 15:41
1. create shadow stack vma during vma_remap cycle
2. copy contents from a premapped non-shstk VMA into it
3. unmap premapped non-shstk VMA
4. Mark shstk VMA for remap into the final destination

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Co-Authored-By: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
[ alex: debugging, rework together with Andrei and code cleanup ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* call shstk_vma_restore() for VMA_AREA_SHSTK in vma_remap()
* delete map/copy/unmap from shstk_restore() and keep token setup + finalize
* before the loop naturally stopped at cet->ssp-8, so a -8 nudge is required here

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* add SHSTK_ENABLE=1 toggle
* passes -mshstk to compiler and -z shstk to linker

Example:
  $ make -C test/zdtm/static clean
  $ make -C test/zdtm/static V=1 SHSTK_ENABLE=1 env00

  $ readelf --notes test/zdtm/static/env00 | grep SHSTK
      Properties: x86 feature: SHSTK

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
We use LGPL-v2.1 license for the libcriu and pycriu as they are
intended to be usable by both proprietary and open-source applications.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
pycriu depends on protobuf to function correctly. Currently,
it raises an error if protobuf is not installed. Adding
protobuf to the dependencies ensures it is available after
installing pycriu.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Regardless of the actual error message, "Unknown" was always appended
to the end of the string, resulting in messages like:
"DUMP failed: Error(3): No process with such pidUnknown".

Fixed by changing standalone if statements to else-if blocks so
"Unknown" is only added when no specific error condition matches.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
This patch consolidates the action-script tests into
`test/others/action-script` to ensure all tests are executed
consistently and reduce duplication. Since we had two tests that appear
to do the same thing, we can remove the one that doesn't use zdtm.py.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The existing test collects all action-script hooks triggered during
`h`, `ns`, and `uns` runs with ZDTM into `actions_called.txt`, then
verifies that each hook appears at least once. However, the test does
not verify that hooks are invoked *exactly once* or in *correct order*.

This change updates the test to run ZDTM only with ns flavour as this
seems to cover all action-script hooks, and checks that all hooks are
called correctly.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Don't install external pip dependencies when running `make install`.
As we are not really into developing a Python project, we should
not install additional packages. CRIU does that nowhere else.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
These dependencies are required to for `pip install`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
which is used in Makefiles to check for dependencies:

Example:
export USE_ASCIIDOCTOR ?= $(shell which asciidoctor 2>/dev/null)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Unlike "which", which is a separate executable not always installed by
default, "command -v" is a shell built-in available at least for bash,
dash, and busybox shell.

Unlike "which", "command -v" is also easier to grep for, and it is
already used in a few places here.

Inspired by commit 57251d8.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container runtimes that use libcriu (e.g., crun) need to specify a CRIU
configuration file that allows to overwrite default options set via RPC.
This is particularly useful to set options such as `--tcp-established`
via `/etc/criu/runc.conf` in Kubernetes.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Use system-installed CRIU binary instead of a local file

Thanks to @avagin for suggesting this solution.

Co-authored-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
[Errno 2] No such file or directory -> Socket file not found.
[Errno 111] Connection refused -> Service not running.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
This change allows users to call criu.use_sk() without any
parameters to use the default socket name.

Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Move the code that opens the images directory, resolves its absolute
path via readlink(), selects the work_dir, and chdir()s into it into a
new function: setup_images_and_workdir(). This reduces the size of
`setup_opts_from_req()`, improves its readability, and allows this
functionality to be reused.

While at it, change open_image_dir() to take a const char *dir
parameter, reflecting that the path is not modified by the function and
allowing callers to pass string literals without casts.

No functional changes are intended.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Commit 9089ce8 ("service: use setproctitle") extended cr-service to
get the full path of images_dir using readlink(). However, the RPC
API was later extended to allow setting a custom path (folder) to
be set instead of passing a file descriptor, which causes readlink()
to fail as the path is not a symbolic link.

It would be better to drop the code setting the images-dir path as a
string in the proctitle.

Fixes: checkpoint-restore#2794

Suggested-by: Andrei Vagin <avagin@google.com>
Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Move the images_dir selection logic from setup_opts_from_req() into a
new function: resolve_images_dir_path(). This improves readability and
allows the code to be reused. While at it, use snprintf() instead of
sprintf() for the /proc path and ensure NULL termination after strncpy().

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Move the logging initialization into a helper function that
can be reused.

No functional change intended.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The check() functionality is very different from dump, pre-dump,
and restore. It is used only to check if the kernel supports required
features, and does not need the majority of options set via RPC.

In particular, we don't need to open `image_dir` when running `check()`
because this functionality doesn't create or process image files. In
this case, `image_dir` is used as `work_dir`, only when the latter is
not specified and a log file is used.

This patch updates the RPC options parser so that it only handles the
logging options when check() is used. Logging to a file is required when
log_file is explicitly set or no log_to_stderr is used. In such case, we
also resolve images_dir and work_dir where the log file will be created.

Fixes: checkpoint-restore#2758

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This allows users to specify RPC options when
using the check() functionality.

Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
_init__.py defines the public API for pycriu. It is important to use
explicit imports to avoid leaking every symbol from criu.py into the
pycriu namespace. This avoids import-time side effects, prevents name
collisions, and circular-import traps.

Fixes the following lint error:
  F403 `from .criu import *` used; unable to detect undefined names

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The --mntns-compat-mode option is no longer parsed with CHECK.
Use --log-file instead to test the error message.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Use nr_pages when available, falling back to compat_nr_pages
for compatibility.

Signed-off-by: alam0rt <sam@samlockart.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
@avagin avagin merged commit cb8e1da into checkpoint-restore:master Nov 5, 2025
19 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.