Skip to content

[Deepin-Kernel-SIG] [linux 6.6-y] [Openeuler] cgroup: disable kernel memory accounting for all memory cgroups by de… #967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: linux-6.6.y
Choose a base branch
from

Conversation

opsiff
Copy link
Member

@opsiff opsiff commented Jul 22, 2025

…fault

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLND CVE: NA


The kernel memory accounting for all memory cgroups is not stable, and it will cause a 100% regression in hackbench compared with kernel-4.19, so disable it by default. We can use the following command line to enable or disable it:
cgroup.memory=kmem or cgroup.memory=nokmem.

Summary by Sourcery

Disable kernel memory accounting for all memory cgroups by default to avoid performance regressions and introduce a command-line option to re-enable it.

New Features:

  • Add cgroup.memory=kmem command-line token to explicitly enable kernel memory accounting

Bug Fixes:

  • Disable unstable kernel memory accounting for all memory cgroups by default to prevent performance regressions in hackbench

Documentation:

  • Update memory cgroup documentation to reflect default disabling of kernel memory accounting and the new kmem/nokmem options

…fault

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I8QLND
CVE: NA

----------------------------------------

The kernel memory accounting for all memory cgroups is
not stable, and it will cause a 100% regression in
hackbench compared with kernel-4.19, so disable it by
default. We can use the following command line to enable
or disable it:
cgroup.memory=kmem or cgroup.memory=nokmem.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: chenridong <chenridong@huawei.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Copy link

sourcery-ai bot commented Jul 22, 2025

Reviewer's Guide

Disable kernel memory accounting for all memory cgroups by default and introduce a "kmem" kernel parameter to re-enable it, with corresponding documentation updates.

Class diagram for cgroup memory kernel parameter handling

classDiagram
    class memcontrol.c {
        +bool cgroup_memory_nokmem
        +int __init cgroup_memory(char *s)
    }
    memcontrol.c : cgroup_memory_nokmem = true (default)
    memcontrol.c : cgroup_memory(char *s) parses "kmem" and "nokmem" tokens
Loading

Flow diagram for cgroup.memory kernel parameter parsing

flowchart TD
    A[Boot with kernel parameters] --> B{cgroup.memory token}
    B -->|kmem| C[Set cgroup_memory_nokmem = false]
    B -->|nokmem| D[Set cgroup_memory_nokmem = true]
    B -->|other| E[No change to cgroup_memory_nokmem]
Loading

File-Level Changes

Change Details Files
Default-disable kernel memory accounting
  • Initialize cgroup_memory_nokmem to true by default
mm/memcontrol.c
Extend kernel parameter parsing for kmem option
  • Add handling of "kmem" token to set cgroup_memory_nokmem back to false
mm/memcontrol.c
Update documentation to reflect new default and kmem option
  • Revise memory cgroup admin-guide to note default disable
  • Add "kmem" parameter description in kernel-parameters.txt
Documentation/admin-guide/cgroup-v1/memory.rst
Documentation/admin-guide/kernel-parameters.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link

deepin pr auto review

代码审查意见:

  1. 文档更新

    • memory.rst文件中,文档描述了内核内存计费状态的默认值和启动参数的更改。建议确认这些更改是否与实际代码实现相匹配,并确保文档内容与代码逻辑一致。
  2. 代码逻辑

    • mm/memcontrol.c文件中,cgroup_memory_nokmem变量的默认值被更改为true,这意味着内核内存计费默认是禁用的。如果这是有意为之,应该更新文档以反映这一变化。
  3. 代码风格

    • mm/memcontrol.c文件中,cgroup_memory函数的else if语句应该保持一致的缩进,以提高代码的可读性。
  4. 安全性

    • 没有发现与安全性直接相关的更改,但是确保所有用户输入都被正确验证和处理是非常重要的,特别是在处理内核参数时。
  5. 性能

    • 内核内存计费的启用或禁用可能会影响系统的性能,特别是当内存资源紧张时。建议进行性能测试,以确保这些更改不会引入任何性能问题。
  6. 错误处理

    • cgroup_memory函数中,如果遇到未知的参数,应该有错误处理机制,而不是简单地忽略。这可以通过打印错误消息或返回错误码来实现。

综上所述,代码的更改看起来是为了调整内核内存计费的行为,但是需要确保这些更改与文档和代码的其他部分保持一致,并且进行了充分的测试以确保没有引入新的问题。

@opsiff
Copy link
Member Author

opsiff commented Jul 22, 2025

Test result:

After patch:

Benchmark Run: 二 7月 22 2025 18:35:19 - 18:42:02
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 50758627.6 lps (10.0 s, 1 samples)
Double-Precision Whetstone 5025.3 MWIPS (10.0 s, 1 samples)
Execl Throughput 4715.9 lps (29.0 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 679732.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 185055.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 2070732.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1747990.9 lps (10.0 s, 1 samples)
Pipe-based Context Switching 134971.1 lps (10.0 s, 1 samples)
Process Creation 10387.2 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 12976.8 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5122.4 lpm (60.0 s, 1 samples)
System Call Overhead 1455119.4 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 50758627.6 4349.5
Double-Precision Whetstone 55.0 5025.3 913.7
Execl Throughput 43.0 4715.9 1096.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 679732.0 1716.5
File Copy 256 bufsize 500 maxblocks 1655.0 185055.0 1118.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 2070732.0 3570.2
Pipe Throughput 12440.0 1747990.9 1405.1
Pipe-based Context Switching 4000.0 134971.1 337.4
Process Creation 126.0 10387.2 824.4
Shell Scripts (1 concurrent) 42.4 12976.8 3060.6
Shell Scripts (8 concurrent) 6.0 5122.4 8537.4
System Call Overhead 15000.0 1455119.4 970.1
========
System Benchmarks Index Score 1606.7


Benchmark Run: 二 7月 22 2025 18:42:02 - 18:48:45
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables 207921797.3 lps (10.0 s, 1 samples)
Double-Precision Whetstone 36513.3 MWIPS (10.0 s, 1 samples)
Execl Throughput 25336.3 lps (29.1 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3781097.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1047945.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 8073151.0 KBps (30.0 s, 1 samples)
Pipe Throughput 9617995.8 lps (10.0 s, 1 samples)
Pipe-based Context Switching 1193067.8 lps (10.0 s, 1 samples)
Process Creation 55236.8 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 44385.3 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5861.9 lpm (60.0 s, 1 samples)
System Call Overhead 9307530.4 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 207921797.3 17816.8
Double-Precision Whetstone 55.0 36513.3 6638.8
Execl Throughput 43.0 25336.3 5892.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 3781097.0 9548.2
File Copy 256 bufsize 500 maxblocks 1655.0 1047945.0 6332.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 8073151.0 13919.2
Pipe Throughput 12440.0 9617995.8 7731.5
Pipe-based Context Switching 4000.0 1193067.8 2982.7
Process Creation 126.0 55236.8 4383.9
Shell Scripts (1 concurrent) 42.4 44385.3 10468.2
Shell Scripts (8 concurrent) 6.0 5861.9 9769.8
System Call Overhead 15000.0 9307530.4 6205.0
========
System Benchmarks Index Score 7608.4

before:
Benchmark Run: 日 7月 20 2025 15:38:15 - 15:44:59
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 49626251.2 lps (10.0 s, 1 samples)
Double-Precision Whetstone 5025.0 MWIPS (10.0 s, 1 samples)
Execl Throughput 4894.6 lps (29.7 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 736328.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 196596.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 2183889.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1395543.8 lps (10.0 s, 1 samples)
Pipe-based Context Switching 128875.8 lps (10.0 s, 1 samples)
Process Creation 11010.0 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 13300.7 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5101.6 lpm (60.0 s, 1 samples)
System Call Overhead 1439217.5 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 49626251.2 4252.5
Double-Precision Whetstone 55.0 5025.0 913.6
Execl Throughput 43.0 4894.6 1138.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 736328.0 1859.4
File Copy 256 bufsize 500 maxblocks 1655.0 196596.0 1187.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 2183889.0 3765.3
Pipe Throughput 12440.0 1395543.8 1121.8
Pipe-based Context Switching 4000.0 128875.8 322.2
Process Creation 126.0 11010.0 873.8
Shell Scripts (1 concurrent) 42.4 13300.7 3137.0
Shell Scripts (8 concurrent) 6.0 5101.6 8502.7
System Call Overhead 15000.0 1439217.5 959.5
========
System Benchmarks Index Score 1607.3


Benchmark Run: 日 7月 20 2025 15:44:59 - 15:51:42
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables 207921608.7 lps (10.0 s, 1 samples)
Double-Precision Whetstone 36527.7 MWIPS (10.0 s, 1 samples)
Execl Throughput 26542.3 lps (29.1 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3965438.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1118004.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 8322845.0 KBps (30.0 s, 1 samples)
Pipe Throughput 8503417.5 lps (10.0 s, 1 samples)
Pipe-based Context Switching 986744.0 lps (10.0 s, 1 samples)
Process Creation 53402.1 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 42831.1 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5578.8 lpm (60.0 s, 1 samples)
System Call Overhead 9336168.9 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 207921608.7 17816.8
Double-Precision Whetstone 55.0 36527.7 6641.4
Execl Throughput 43.0 26542.3 6172.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 3965438.0 10013.7
File Copy 256 bufsize 500 maxblocks 1655.0 1118004.0 6755.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 8322845.0 14349.7
Pipe Throughput 12440.0 8503417.5 6835.5
Pipe-based Context Switching 4000.0 986744.0 2466.9
Process Creation 126.0 53402.1 4238.3
Shell Scripts (1 concurrent) 42.4 42831.1 10101.7
Shell Scripts (8 concurrent) 6.0 5578.8 9297.9
System Call Overhead 15000.0 9336168.9 6224.1
========
System Benchmarks Index Score 7458.2

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes - here's some feedback:

  • Consider using the kernel’s match_token helper instead of manual strcmp loops for parsing the cgroup.memory tokens to make the code more maintainable and extensible.
  • Add explicit conflict detection or precedence handling when both “kmem” and “nokmem” are passed so users aren’t left with ambiguous behavior.
  • Double-check that cgroup v2’s memory controller either matches this default-off kmem behavior or provides its own symmetric option to avoid inconsistent defaults across cgroup versions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using the kernel’s match_token helper instead of manual strcmp loops for parsing the cgroup.memory tokens to make the code more maintainable and extensible.
- Add explicit conflict detection or precedence handling when both “kmem” and “nokmem” are passed so users aren’t left with ambiguous behavior.
- Double-check that cgroup v2’s memory controller either matches this default-off kmem behavior or provides its own symmetric option to avoid inconsistent defaults across cgroup versions.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@opsiff
Copy link
Member Author

opsiff commented Jul 22, 2025

Tldr: execl from 5356 to 5573(+4%)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR disables kernel memory accounting for all memory cgroups by default to address performance regression issues, specifically a 100% regression in hackbench compared to kernel-4.19. The change introduces a command-line option to re-enable the feature when needed.

  • Changes the default state of kernel memory accounting from enabled to disabled
  • Adds a new "kmem" command-line parameter to explicitly enable kernel memory accounting
  • Updates documentation to reflect the new default behavior and available options

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
mm/memcontrol.c Changes default value of cgroup_memory_nokmem to true and adds kmem parameter handling
Documentation/admin-guide/kernel-parameters.txt Documents the new kmem command-line option
Documentation/admin-guide/cgroup-v1/memory.rst Updates memory cgroup documentation to reflect disabled-by-default behavior

@@ -7618,6 +7618,8 @@ static int __init cgroup_memory(char *s)
cgroup_memory_nosocket = true;
if (!strcmp(token, "nokmem"))
cgroup_memory_nokmem = true;
else if (!strcmp(token, "kmem"))
Copy link
Preview

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else-if condition should be changed to a separate if statement for consistency with the existing pattern. The current code has separate if statements for 'nosocket', 'nokmem', and 'nobpf' tokens, but uses else-if for 'kmem', which creates inconsistent structure.

Suggested change
else if (!strcmp(token, "kmem"))
if (!strcmp(token, "kmem"))

Copilot uses AI. Check for mistakes.

at boot time. In this case, kernel memory will not be accounted at all.
Kernel memory accounting is disabled for all memory cgroups by default. But
it can be enabled system-wide by passing cgroup.memory=kmem to the kernel
at boot time. In this case, kernel memory will all be accounted.
Copy link
Preview

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase 'kernel memory will all be accounted' is grammatically awkward. It should be 'all kernel memory will be accounted' or 'kernel memory will be accounted'.

Suggested change
at boot time. In this case, kernel memory will all be accounted.
at boot time. In this case, all kernel memory will be accounted.

Copilot uses AI. Check for mistakes.

@Avenger-285714
Copy link
Collaborator

@winnscode 你怎么看

@@ -87,7 +87,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg);
static bool cgroup_memory_nosocket __ro_after_init;

/* Kernel memory accounting disabled? */
static bool cgroup_memory_nokmem __ro_after_init;
static bool cgroup_memory_nokmem __ro_after_init = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不修改这里,会不会侵入性更小些

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants