Skip to content

Upgrade Linux kernel from 6.6 to 6.12 #2300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ader1990
Copy link
Contributor

@ader1990 ader1990 commented Sep 10, 2024

Upgrade Linux kernel from the 6.6.y stable branch to 6.12.y stable branch.

See: flatcar/Flatcar#1527

This PR is mostly to reveal any possible big blockers before getting to the new 6.12 LTS release.

Tested 6.10.y and it works as expected.
Tested 6.11.y and it works as expected.

Now testing 6.12.y.

Testing done

[Describe the testing you have done before submitting this PR. Please include both the commands you issued as well as the output you got.]

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

Boot partition size:

arm64: /dev/nvme0n1p1     129039    63368     65672  50% /boot
amd64: /dev/vda1          129039    62852     66187  49% /boot

@ader1990
Copy link
Contributor Author

ZFS 2.2.5 does not support kernel 6.10, the zfs upgrade patches will be dropped after portage stable update PR gets merged (with 2.2.6 zfs): #2298

Copy link

github-actions bot commented Sep 10, 2024

Build action triggered: https://github.com/flatcar/scripts/actions/runs/14906773777

@@ -36,6 +36,5 @@ IUSE=""
# local patches overlap with the upstream patch.
UNIPATCH_LIST="
${PATCH_DIR}/z0001-kbuild-derive-relative-path-for-srctree-from-CURDIR.patch \
${PATCH_DIR}/z0002-revert-pahole-flags.patch \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this?
When pahole is executed with -j (parallel) then btf metadata order is non-deterministic and the built kernel and modules don't match.

It doesn't have to be a revert, but we need to carry some patch (unless something significant changed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, working on it. pahole flags moved to scripts/Makefile.btf, so that needs to be addressed, was working now on a patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently updated pahole to a newer version (1.27) that was supposed to be reproducible regardless of how many threads it uses, but dropping the patch didn't work for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we may still need a kernel patch to pass --btf_features=all,reproducible_build: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=v1.27&id=43bd3efa85656565129063cdd6dd7499e44a7867

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be upstreamed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it asap and send it to LKML if it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the reproducible_build flag to the pahole params, although I don't like how that addition is done, that file is a beautiful soup and needs some better management.

@jepio
Copy link
Member

jepio commented Sep 10, 2024

Things like this make me want to wait with an upgrade to 6.10:
"[regression] significant delays when secureboot is enabled since 6.10" https://lore.kernel.org/lkml/92fbcc4c252ec9070d71a6c7d4f1d196ec67eeb0.camel@huaweicloud.com/T/#mb17f32470541d54f7ee45987d510aa45b7557969

It takes a couple minor releases on a new stable branch before it is ready to make its way into Flatcar.

@ader1990 ader1990 marked this pull request as draft September 10, 2024 10:36
@ader1990
Copy link
Contributor Author

Things like this make me want to wait with an upgrade to 6.10: "[regression] significant delays when secureboot is enabled since 6.10" https://lore.kernel.org/lkml/92fbcc4c252ec9070d71a6c7d4f1d196ec67eeb0.camel@huaweicloud.com/T/#mb17f32470541d54f7ee45987d510aa45b7557969

It takes a couple minor releases on a new stable branch before it is ready to make its way into Flatcar.

Adding the blocker bug here: https://bugzilla.kernel.org/show_bug.cgi?id=219229

Possible resolution from the bug discussion in the kernel config:

CONFIG_TCG_TPM2_HMAC=n

@ader1990
Copy link
Contributor Author

The feature CONFIG_TCG_TPM2_HMAC has been introduced in 6.10 as extra security layer: https://github.com/torvalds/linux/blob/master/drivers/char/tpm/Kconfig#L37

@ader1990
Copy link
Contributor Author

Managed to get the ARM64 image built, but the AMD64 image fails at the initrd/grub stage with error: cpio: premature end of file.

Full error bellow:

2024-09-11T13:33:24.1344201Z INFO    grub_install.sh: Installing GRUB x86_64-xen in flatcar_production_image.bin
2024-09-11T13:33:24.1537425Z INFO    grub_install.sh: Compressing modules in flatcar/grub/x86_64-xen
2024-09-11T13:33:25.2833839Z INFO    grub_install.sh: Generating flatcar/grub/x86_64-xen/load.cfg
2024-09-11T13:33:25.3866147Z INFO    grub_install.sh: Generating flatcar/grub/x86_64-xen/core.elf
2024-09-11T13:33:25.4519108Z INFO    grub_install.sh: Installing default x86_64 Xen bootloader.
2024-09-11T13:33:25.5266195Z INFO    grub_install.sh: Elapsed time (grub_install.sh): 0m2s
2024-09-11T13:33:25.5754372Z INFO    build_image: Generating flatcar_production_image_pcr_policy.zip
2024-09-11T13:33:25.8790423Z INFO    build_image: Writing flatcar_production_image_contents.txt
2024-09-11T13:33:26.7383193Z INFO    build_image: Writing flatcar_production_image_contents_wtd.txt
2024-09-11T13:33:26.9908326Z cpio: premature end of file
2024-09-11T13:33:26.9914934Z rmdir: failed to remove '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4089.0.0+nightly-20240910-2100-12-g6dd0a5b3f7-a1/tmp_initrd_contents/rootfs-0': Directory not empty
2024-09-11T13:33:27.0063435Z ERROR   build_image: script called: build_image '--board=amd64-usr' '--group=developer' '--output_root=/home/sdk/trunk/src/scripts/artifacts' 'prodtar' 'container' 'sysext'
2024-09-11T13:33:27.0069179Z ERROR   build_image: Backtrace:  (most recent call is last)
2024-09-11T13:33:27.0086074Z ERROR   build_image:   file build_image, line 173, called: create_prod_image 'flatcar_production_image.bin' 'base' 'developer' 'coreos-base/coreos' 'containerd-flatcar:app-containers/containerd,docker-flatcar:app-containers/docker&app-containers/docker-cli&app-containers/docker-buildx'
2024-09-11T13:33:27.0103055Z ERROR   build_image:   file prod_image_util.sh, line 169, called: finish_image 'flatcar_production_image.bin' 'base' '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4089.0.0+nightly-20240910-2100-12-g6dd0a5b3f7-a1/rootfs' 'flatcar_production_image_contents.txt' 'flatcar_production_image_contents_wtd.txt' 'flatcar_production_image.vmlinuz' 'flatcar_production_image_pcr_policy.zip' 'flatcar_production_image.grub' 'flatcar_production_image.shim' 'flatcar_production_image_kernel_config.txt' 'flatcar_production_image_initrd_contents.txt' 'flatcar_production_image_initrd_contents_wtd.txt' 'flatcar_production_image_disk_usage.txt'
2024-09-11T13:33:27.0112878Z ERROR   build_image:   file build_image_util.sh, line 903, called: die_err_trap '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' '1'
2024-09-11T13:33:27.0118365Z ERROR   build_image: 
2024-09-11T13:33:27.0124923Z ERROR   build_image: Command failed:
2024-09-11T13:33:27.0132629Z ERROR   build_image:   Command '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' exited with nonzero code: 1

@ader1990
Copy link
Contributor Author

Successful build for the AMD64:

 uname -a
Linux localhost 6.10.9-flatcar #1 SMP PREEMPT_DYNAMIC Wed Sep 11 17:33:15 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz GenuineIntel GNU/Linux
root@localhost ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4089.0.0+nightly-20240910-2100-14-g5595c96aa4
VERSION_ID=4089.0.0
BUILD_ID=nightly-20240910-2100-14-g5595c96aa4
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4089.0.0+nightly-20240910-2100-14-g5595c96aa4 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4089.0.0+nightly-20240910-2100-14-g5595c96aa4:*:*:*:*:*:*:*"

@ader1990
Copy link
Contributor Author

The bpf amd64 bpf.execsnoop mantle test should be fixed by a new image of iovisor/bcc iovisor/bcc@5d2ef17. I triggered an image update https://github.com/flatcar/mantle/actions/runs/10832754186/job/30057878029.

@ader1990 ader1990 force-pushed the ader1990/linux_kernel_6_10 branch from 5595c96 to 4ad039e Compare September 17, 2024 13:27
@ader1990
Copy link
Contributor Author

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?

I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).

I am oscillating between this 4ad039e vs changing the name in all places.

@t-lo
Copy link
Member

t-lo commented Sep 17, 2024

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?

I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).

I am oscillating between this 4ad039e vs changing the name in all places.

I think we should rename the systemd service to prevent confusion down the road.

@ader1990
Copy link
Contributor Author

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?
I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).
I am oscillating between this 4ad039e vs changing the name in all places.

I think we should rename the systemd service to prevent confusion down the road.

The thing is that the binaries do the same thing / have the same interface, but just internally have a different implementation aka uio_hv_generic. The weird part is that the old implementation is still present, but has build disabled.

I will add a new service definition (as it also has a different device path trigger) for the new version, to keep things separate.

@ader1990
Copy link
Contributor Author

The /boot partition is very close to a critical level, 49% already used, leaving around 1.5MB free to use:

/dev/vda1          129039    62852     66187  49% /boot

@ader1990
Copy link
Contributor Author

Note: on AMD64 vmlinuz-a, the build_library/extract-initramfs-from-vmlinuz.sh fails due to the fact now that the scripts finds the corrupted CPIO first. Need to do some more debugging on why this issue happens in the first place (what has changed upstream).

@ader1990 ader1990 self-assigned this Sep 20, 2024
@ader1990 ader1990 force-pushed the ader1990/linux_kernel_6_10 branch from 06de6c8 to f62e9ba Compare September 24, 2024 12:55
@ader1990 ader1990 changed the title Upgrade Linux kernel from 6.6 to 6.10 Upgrade Linux kernel from 6.6 to 6.12 Oct 28, 2024
@chewi
Copy link
Contributor

chewi commented Apr 28, 2025

Just a heads up that the latest CI run shows that both the amd64 and arm64 vmlinuz images will be the largest they've ever been, despite my recent changes to tackle the size. They're still within the limit, but obviously it's getting very tight now. I'll keep looking at moving the image.

@jepio
Copy link
Member

jepio commented Apr 28, 2025

Just a heads up that the latest CI run shows that both the amd64 and arm64 vmlinuz images will be the largest they've ever been, despite my recent changes to tackle the size. They're still within the limit, but obviously it's getting very tight now. I'll keep looking at moving the image.

How big are we now? And have we looked into what is contributing to an increase with this update?

@chewi
Copy link
Contributor

chewi commented Apr 28, 2025

How big are we now? And have we looked into what is contributing to an increase with this update?

61,065,720 bytes for amd64 and 63,002,616 bytes for arm64. The size has bumped up and down for various reasons over time, but the previous biggest sizes I have recorded are 59,903,488 and 62,138,880 respectively. I have kept track of the remaining space with two such images present alongside the other files in /boot across different releases. In the worst case, we have 2,022,928 and 3,070,992 remaining respectively.

Some kernel modules have increased in size, but not massively. There are some new ones, but some have been removed too. There are a couple of new firmware files, but these are small. The report seems slightly bugged here.

01:57:04  @@ -355,6 +355,8 @@
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8107e-2.fw
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8125a-3.fw
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8125b-2.fw
01:57:04  +/rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-2.fw
01:57:04  +/rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-3.fw
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8153a-2.fw
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8153a-3.fw
01:57:04   /rootfs-1/usr/lib/firmware/rtl_nic/rtl8153a-4.fw

...

01:57:04  @@ -683,6 +674,10 @@
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko.xz
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/ixgbevf
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/ixgbevf/ixgbevf.ko.xz
01:57:04  +/rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/libeth
01:57:04  +/rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/libeth/libeth.ko.xz
01:57:04  +/rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/libie
01:57:04  +/rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/intel/libie/libie.ko.xz
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/jme.ko.xz
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/marvell
01:57:04   /rootfs-1/usr/lib/modules/a.b.c-flatcar/kernel/drivers/net/ethernet/marvell/skge.ko.xz

...

01:57:09  All 4 newly added files:
01:57:09  
01:57:09  ./rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-2.fw (29248 bytes, 28 kbytes)
01:57:09  ./rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-2.fw (29248 bytes, 28 kbytes)
01:57:09  ./rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-3.fw (13232 bytes, 12 kbytes)
01:57:09  ./rootfs-1/usr/lib/firmware/rtl_nic/rtl8126a-3.fw (13232 bytes, 12 kbytes)

Also bear in mind that the arm64 kernel itself cannot be compressed, but it does uniquely benefit from CONFIG_RELR, and the raw size is probably smaller anyway.

Some new drivers have presumably been added, and I would expect the kernel to generally grow in size anyway, so such a jump from 6.6 to 6.12 does not surprise me. It's just something to keep an eye on.

@jepio
Copy link
Member

jepio commented Apr 29, 2025

How big are we now? And have we looked into what is contributing to an increase with this update?

...
Some new drivers have presumably been added, and I would expect the kernel to generally grow in size anyway, so such a jump from 6.6 to 6.12 does not surprise me. It's just something to keep an eye on.

Could you investigate this? There's likely some settings that got auto-enabled that are not needed.

@chewi
Copy link
Contributor

chewi commented May 7, 2025

One thing I've noticed. CONFIG_MODULE_COMPRESS has been enabled. This sounds like it would save space, but the description notes that it's more efficient to disable this and rely on compression of the whole initrd instead, which makes sense. Of course, we have modules in /usr too, but we are applying btrfs compression. This may or may not be more efficient, but it's the initrd we really care about.

ader1990 and others added 7 commits May 8, 2025 05:03
pahole: added a revamped patch to remove the parallel implementation
kernel: use pahole 1.27 feature of reproducible builds
The out-of-tree nvidia driver requires symbols that are behind DRM_TTM_HELPER
if DRM_FBDEV_EMULATION is enabled, but DRM_TTM_HELPER can't be selected unless
we build more drm drivers (which is undesirable). To get out of this, disable
DRM_FBDEV_EMULATION.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Remove CONFIG_AMD_IOMMU_V2, CONFIG_FB_ARMCLCD, CONFIG_MD_LINEAR, CONFIG_NET_ACT_IPT.

Add CONFIG_MODULE_COMPRESS.

See: torvalds/linux@5a0b11a

linux: remove CONFIG_MD_LINEAR

See: torvalds/linux@849d18e

linux: remove CONFIG_NET_ACT_IPT

See: torvalds/linux@86fe596

linux: add required CONFIG_MODULE_COMPRESS=y

See: torvalds/linux@c7ff693

linux: remove CONFIG_FB_ARMCLCD

See: torvalds/linux@dee56cc
@ader1990 ader1990 force-pushed the ader1990/linux_kernel_6_10 branch from f0479a5 to 288e8c9 Compare May 8, 2025 05:04
@ader1990
Copy link
Contributor Author

ader1990 commented May 8, 2025

One thing I've noticed. CONFIG_MODULE_COMPRESS has been enabled. This sounds like it would save space, but the description notes that it's more efficient to disable this and rely on compression of the whole initrd instead, which makes sense. Of course, we have modules in /usr too, but we are applying btrfs compression. This may or may not be more efficient, but it's the initrd we really care about.

CONFIG_MODULE_COMPRESS was enabled because it is a new requirement for the CONFIG_MODULE_* that we already had enabled -> CONFIG_MODULE_COMPRESS_XZ=y. Without setting CONFIG_MODULE_COMPRESS, the build fails as CONFIG_MODULE_COMPRESS_XZ=y was set.

See the commit message here: 74c7e5d and the linked Linux commit description for the details on why: torvalds/linux@c7ff693

@chewi
Copy link
Contributor

chewi commented May 8, 2025

I see. I did some comparisons with both options disabled. The combined kernel image does indeed come out 1MB smaller. It's harder to measure the impact on /usr. zstd is used with btrfs, so if I recompress all the coreos-modules files with zstd, that comes out 119MB larger. This may not be representative. I think we can spare 1MB for now, and I still hope to move the kernel, so let's leave this as it is.

@ader1990
Copy link
Contributor Author

ader1990 commented May 8, 2025

I see. I did some comparisons with both options disabled. The combined kernel image does indeed come out 1MB smaller. It's harder to measure the impact on /usr. zstd is used with btrfs, so if I recompress all the coreos-modules files with zstd, that comes out 119MB larger. This may not be representative. I think we can spare 1MB for now, and I still hope to move the kernel, so let's leave this as it is.

What I was trying to say is that the CONFIG_MODULE_COMPRESS=y was a config that I added to keep things as-is. The purpose of this PR is to change the kernel from 6.6 to 6.12, and not to change current Flatcar configurations or behaviour.

@chewi
Copy link
Contributor

chewi commented May 8, 2025

That's okay, I'm just trying to find reasons for the size increase.

CONFIG_MD_LINEAR has changed from m to not set, which surprised me a little. I can see that it was temporarily dropped from the kernel entirely but then restored (see torvalds/linux@2b7f974) when they realised that quite a few people actually use it. Maybe we should turn it back on.

Some of the cryptographic options have changed from m to y. This is at least partially due to the new CONFIG_TCG_TPM2_HMAC, so there is a reason.

I can't see anything else obviously unneeded and/or huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ⚒️ In Progress
Development

Successfully merging this pull request may close these issues.

8 participants