-
Notifications
You must be signed in to change notification settings - Fork 678
Start baking CRIU 4.2 #2791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Start baking CRIU 4.2 #2791
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
|
@avagin Would be possible to include the patches from the following PRs in the release? |
Signed-off-by: Adrian Reber <areber@redhat.com>
This is highly confusing, and it seems that the ret variable is not handled in the subsequent process. Signed-off-by: Yuanhong Peng <yummypeng@linux.alibaba.com>
The stack test incorrectly assumed the page immediately following the stack pointer could never be changed. This doesn't work, because this page can be a part of another mapping. This commit introduces a dedicated "stack redzone," a small guard region directly after the stack. The stack test is modified to specifically check for corruption within this redzone. Signed-off-by: Andrei Vagin <avagin@gmail.com>
Thomas Gleixner introduced the new interface to create posix timers with specifed timer IDs: torvalds/linux@ec2d0c0 Previously, CRIU recreated timers by repeatedly creating and deleting them until the desired ID was reached. This approach isn't fast, especially for timers with large IDs. For example, restoring two timers with IDs 1000000 and 2000000 took approximately 1.5 seconds. The new `prctl()` based interface allows direct creation of timers with specified IDs, reducing the restoration time to around 3 microseconds for the same example. Signed-off-by: Andrei Vagin <avagin@gmail.com>
When handing errors for functions such as `ptrace()`, `pipe()`, and `fork()` it would be better to use `pr_perror` instead of `pr_err` as it would include a message describing the encountered error. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The `goto interrupt` label is unnecessary as the code directly returns after `cuda_process_checkpoint_action()`. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
On a RHEL 8 based system building CRIU fails with:
criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
^~~~~~~~~~~~~~~~~~~~~~~
NT_ARM_PACA_KEYS
criu/arch/aarch64/crtools.c:73:39: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
^~~~~~~~~~~~~~~~~~~~~~~
NT_ARM_PACA_KEYS
This adds the missing define if it is undefined.
Signed-off-by: Adrian Reber <areber@redhat.com>
Currently we save FP regs before parasite code runs, and restore after for --leave-running, --check-only, and in case of errors. In case of errors the error may have happened before FP regs were saved, so we should only restore them if they were actually saved. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
CRIU locks the network during restore in an "empty" network namespace. However, "empty" in this context means CRIU isn't restoring the namespace. This network namespace can be the same namespace where processes have been dumped and so the network is already locked in it. Fixes checkpoint-restore#2650 Signed-off-by: Andrei Vagin <avagin@gmail.com>
Building CRIU package on Debian 11 aarch64 fails with
criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:32:31: error: storage size of 'paca' isn't known
struct user_pac_address_keys paca;
^~~~
criu/arch/aarch64/crtools.c:33:31: error: storage size of 'pacg' isn't known
struct user_pac_generic_keys pacg;
^~~~
criu/arch/aarch64/crtools.c:47:15: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
if (hwcaps & HWCAP_PACA) {
^~~~~~~~~~
HWCAP_FCMA
criu/arch/aarch64/crtools.c:47:15: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c:53:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
^~~~~~~~~~~~~~~~
NT_ARM_SVE
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:82:15: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
if (hwcaps & HWCAP_PACG) {
^~~~~~~~~~
HWCAP_AES
criu/arch/aarch64/crtools.c:88:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
^~~~~~~~~~~~~~~~
NT_ARM_SVE
criu/arch/aarch64/crtools.c:33:31: error: unused variable 'pacg' [-Werror=unused-variable]
struct user_pac_generic_keys pacg;
^~~~
criu/arch/aarch64/crtools.c:32:31: error: unused variable 'paca' [-Werror=unused-variable]
struct user_pac_address_keys paca;
^~~~
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:227:31: error: storage size of 'upaca' isn't known
struct user_pac_address_keys upaca;
^~~~~
criu/arch/aarch64/crtools.c:228:31: error: storage size of 'upacg' isn't known
struct user_pac_generic_keys upacg;
^~~~~
criu/arch/aarch64/crtools.c:241:18: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
if (!(hwcaps & HWCAP_PACA)) {
^~~~~~~~~~
HWCAP_FCMA
criu/arch/aarch64/crtools.c:255:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
^~~~~~~~~~~~~~~~
NT_ARM_SVE
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:268:18: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
if (!(hwcaps & HWCAP_PACG)) {
^~~~~~~~~~
HWCAP_AES
criu/arch/aarch64/crtools.c:275:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
^~~~~~~~~~~~~~~~
NT_ARM_SVE
criu/arch/aarch64/crtools.c:233:6: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
int ret;
^~~
criu/arch/aarch64/crtools.c:228:31: error: unused variable 'upacg' [-Werror=unused-variable]
struct user_pac_generic_keys upacg;
^~~~~
criu/arch/aarch64/crtools.c:227:31: error: unused variable 'upaca' [-Werror=unused-variable]
struct user_pac_address_keys upaca;
^~~~~
This patch adds the missing constants and structs if undefined.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
8cf6bb3 to
db4d493
Compare
Mount flags belong to mount and mount namespace of the Container, so we should preserve them, as Container user will not expect mounts switching between ro and rw over c/r. Fixes: checkpoint-restore#2632 v5: fix both mount-v1 and mount-v2 Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Add {'bind': 'path/to/bindmount'} zdtm descriptor option, so that in
test mount namespace a directory bindmount can be created before running
the test.
This is useful to leave test directory writable (e.g. for logs) while
the test makes root mount readonly. note: We create this bindmount early
so that all test files are opened on it initially and not on the below
mount. Will be used in mnt_ro_root test.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
It makes root mount readonly and checks that it is still readonly after migration. Make zdtm/static writable for logs via "bind" desc option. v2: explain why we don't have explicit rw/ro flag check v3: use new zdtm "bind" desc option Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
With Go version 1.24, ListenConfig now uses MPTCP by default [1]. Checkpoint/restore for this protocol is not currently supported and adding support requires kernel changes that are not trivial to implement. As a result, checkpointing of many containers that run Go programs is likely to fail with the following error [2]: (00.026522) Error (criu/sk-inet.c:130): inet: Unsupported proto 262 for socket 2f9bc5 This patch adds a message with suggested workaround for this problem. [1] https://go.dev/doc/go1.24#netpkgnet [2] checkpoint-restore#2655 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
In some cases, they might not work in virtual machines if the hypervisor doesn't virtualize them. For example, they don't work in AMD SEV virtual machines if the Debug Virtualization extension isn't supported or isn't enabled in SEV_FEATURES. Fixes checkpoint-restore#2658 Signed-off-by: Andrei Vagin <avagin@gmail.com>
In 0a7c5fd we swapped the BSD implementation of strlcat and strlcpy in favor of our own replacement. The checks and the predefined macros are not needed anymore. Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
The container checkpointing procedure in Kubernetes freezes running containers to create a consistent snapshot of both the runtime state and the rootfs of the container. However, when checkpointing a GPU container, the container must be unfrozen before invoking the cuda-checkpoint tool. This is achieved in prepare_freezer_for_interrupt_only_mode(), which needs to be called before the PAUSE_DEVICES hook. The patch introducing this functionality fixes this problem for containers with multiple processes. However, if the container has a single process, prepare_freezer_for_interrupt_only_mode() must be invoked immediately before the PAUSE_DEVICES hook. Fixes: checkpoint-restore#2514 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Building CRIU on Ubuntu 20.04 fails with the following error:
criu/sk-inet.c: In function 'can_dump_ipproto':
criu/sk-inet.c:131:16: error: 'IPPROTO_MPTCP' undeclared (first use in this function); did you mean 'IPPROTO_MTP'?
131 | if (proto == IPPROTO_MPTCP)
| ^~~~~~~~~~~~~
| IPPROTO_MTP
Add definition for MPTCP to fix this error.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Currently, in the target process, device-related restore operations and other restore operations almost run sequentially. When the target process executes the corresponding CRIU hook functions, it can't perform other restore operations. However, for GPU applications, some device restore operations have no logical dependencies on other common restore operations and can be parallelized with other operations to speed up the process. Instead of launching a thread in child processes for parallelization, this patch chooses to add a new hook, `POST_FORKING`, in the main CRIU process to handle these restore operations. This is because the restoration of memory state in the restore blob is one of the most time-consuming parts of all restore logic. The main CRIU process can easily parallelize these operations, whereas parallelizing in threads within child processes is challenging. - POST_FORKING *POST_FORKING: Hook to enable the main CRIU process to perform some restore operations of plugins. Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Currently, when CRIU calls `cr_plugin_init`, `fdstore` is not initialized. However, during the plugin restore procedure, there may be some common file operations used in multiple hooks. This patch moves `cr_plugin_init` after `fdstore_init`, allowing `cr_plugin_init` to use `fdstore` to place these file operations. Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Currently, parallel restore only focuses on the single-process situation. Therefore, it needs an interface to know if there is only one process to restore. This patch adds a `has_children` function in `pstree.h` and replaces some existing implementations with this function. Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
1. create shadow stack vma during vma_remap cycle 2. copy contents from a premapped non-shstk VMA into it 3. unmap premapped non-shstk VMA 4. Mark shstk VMA for remap into the final destination Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com> Co-Authored-By: Andrei Vagin <avagin@gmail.com> Co-Authored-By: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> [ alex: debugging, rework together with Andrei and code cleanup ] Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* call shstk_vma_restore() for VMA_AREA_SHSTK in vma_remap() * delete map/copy/unmap from shstk_restore() and keep token setup + finalize * before the loop naturally stopped at cet->ssp-8, so a -8 nudge is required here Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com> Co-Authored-By: Andrei Vagin <avagin@gmail.com> [ alex: small code cleanups ] Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
* add SHSTK_ENABLE=1 toggle
* passes -mshstk to compiler and -z shstk to linker
Example:
$ make -C test/zdtm/static clean
$ make -C test/zdtm/static V=1 SHSTK_ENABLE=1 env00
$ readelf --notes test/zdtm/static/env00 | grep SHSTK
Properties: x86 feature: SHSTK
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
We use LGPL-v2.1 license for the libcriu and pycriu as they are intended to be usable by both proprietary and open-source applications. Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
pycriu depends on protobuf to function correctly. Currently, it raises an error if protobuf is not installed. Adding protobuf to the dependencies ensures it is available after installing pycriu. Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Regardless of the actual error message, "Unknown" was always appended to the end of the string, resulting in messages like: "DUMP failed: Error(3): No process with such pidUnknown". Fixed by changing standalone if statements to else-if blocks so "Unknown" is only added when no specific error condition matches. Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
This patch consolidates the action-script tests into `test/others/action-script` to ensure all tests are executed consistently and reduce duplication. Since we had two tests that appear to do the same thing, we can remove the one that doesn't use zdtm.py. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The existing test collects all action-script hooks triggered during `h`, `ns`, and `uns` runs with ZDTM into `actions_called.txt`, then verifies that each hook appears at least once. However, the test does not verify that hooks are invoked *exactly once* or in *correct order*. This change updates the test to run ZDTM only with ns flavour as this seems to cover all action-script hooks, and checks that all hooks are called correctly. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Don't install external pip dependencies when running `make install`. As we are not really into developing a Python project, we should not install additional packages. CRIU does that nowhere else. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
These dependencies are required to for `pip install`. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
which is used in Makefiles to check for dependencies: Example: export USE_ASCIIDOCTOR ?= $(shell which asciidoctor 2>/dev/null) Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Unlike "which", which is a separate executable not always installed by default, "command -v" is a shell built-in available at least for bash, dash, and busybox shell. Unlike "which", "command -v" is also easier to grep for, and it is already used in a few places here. Inspired by commit 57251d8. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container runtimes that use libcriu (e.g., crun) need to specify a CRIU configuration file that allows to overwrite default options set via RPC. This is particularly useful to set options such as `--tcp-established` via `/etc/criu/runc.conf` in Kubernetes. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Use system-installed CRIU binary instead of a local file Thanks to @avagin for suggesting this solution. Co-authored-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
[Errno 2] No such file or directory -> Socket file not found. [Errno 111] Connection refused -> Service not running. Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
This change allows users to call criu.use_sk() without any parameters to use the default socket name. Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org> Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Move the code that opens the images directory, resolves its absolute path via readlink(), selects the work_dir, and chdir()s into it into a new function: setup_images_and_workdir(). This reduces the size of `setup_opts_from_req()`, improves its readability, and allows this functionality to be reused. While at it, change open_image_dir() to take a const char *dir parameter, reflecting that the path is not modified by the function and allowing callers to pass string literals without casts. No functional changes are intended. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Commit 9089ce8 ("service: use setproctitle") extended cr-service to get the full path of images_dir using readlink(). However, the RPC API was later extended to allow setting a custom path (folder) to be set instead of passing a file descriptor, which causes readlink() to fail as the path is not a symbolic link. It would be better to drop the code setting the images-dir path as a string in the proctitle. Fixes: checkpoint-restore#2794 Suggested-by: Andrei Vagin <avagin@google.com> Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Move the images_dir selection logic from setup_opts_from_req() into a new function: resolve_images_dir_path(). This improves readability and allows the code to be reused. While at it, use snprintf() instead of sprintf() for the /proc path and ensure NULL termination after strncpy(). Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Move the logging initialization into a helper function that can be reused. No functional change intended. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The check() functionality is very different from dump, pre-dump, and restore. It is used only to check if the kernel supports required features, and does not need the majority of options set via RPC. In particular, we don't need to open `image_dir` when running `check()` because this functionality doesn't create or process image files. In this case, `image_dir` is used as `work_dir`, only when the latter is not specified and a log file is used. This patch updates the RPC options parser so that it only handles the logging options when check() is used. Logging to a file is required when log_file is explicitly set or no log_to_stderr is used. In such case, we also resolve images_dir and work_dir where the log file will be created. Fixes: checkpoint-restore#2758 Suggested-by: Andrei Vagin <avagin@google.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This allows users to specify RPC options when using the check() functionality. Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
_init__.py defines the public API for pycriu. It is important to use explicit imports to avoid leaking every symbol from criu.py into the pycriu namespace. This avoids import-time side effects, prevents name collisions, and circular-import traps. Fixes the following lint error: F403 `from .criu import *` used; unable to detect undefined names Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The --mntns-compat-mode option is no longer parsed with CHECK. Use --log-file instead to test the error message. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
b33b21e to
0fa6ff3
Compare
Use nr_pages when available, falling back to compat_nr_pages for compatibility. Signed-off-by: alam0rt <sam@samlockart.com> Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.