Skip to content

Commit a7f546c

Browse files
committed
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
2 parents c899bd2 + b9a4770 commit a7f546c

File tree

5 files changed

+1428
-509
lines changed

5 files changed

+1428
-509
lines changed

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 98 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,13 @@ constraint, a threaded controller must be able to handle competition
393393
between threads in a non-leaf cgroup and its child cgroups. Each
394394
threaded controller defines how such competitions are handled.
395395

396+
Currently, the following controllers are threaded and can be enabled
397+
in a threaded cgroup::
398+
399+
- cpu
400+
- cpuset
401+
- perf_event
402+
- pids
396403

397404
[Un]populated Notification
398405
--------------------------
@@ -2266,6 +2273,49 @@ Cpuset Interface Files
22662273

22672274
Its value will be affected by memory nodes hotplug events.
22682275

2276+
cpuset.cpus.exclusive
2277+
A read-write multiple values file which exists on non-root
2278+
cpuset-enabled cgroups.
2279+
2280+
It lists all the exclusive CPUs that are allowed to be used
2281+
to create a new cpuset partition. Its value is not used
2282+
unless the cgroup becomes a valid partition root. See the
2283+
"cpuset.cpus.partition" section below for a description of what
2284+
a cpuset partition is.
2285+
2286+
When the cgroup becomes a partition root, the actual exclusive
2287+
CPUs that are allocated to that partition are listed in
2288+
"cpuset.cpus.exclusive.effective" which may be different
2289+
from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive"
2290+
has previously been set, "cpuset.cpus.exclusive.effective"
2291+
is always a subset of it.
2292+
2293+
Users can manually set it to a value that is different from
2294+
"cpuset.cpus". The only constraint in setting it is that the
2295+
list of CPUs must be exclusive with respect to its sibling.
2296+
2297+
For a parent cgroup, any one of its exclusive CPUs can only
2298+
be distributed to at most one of its child cgroups. Having an
2299+
exclusive CPU appearing in two or more of its child cgroups is
2300+
not allowed (the exclusivity rule). A value that violates the
2301+
exclusivity rule will be rejected with a write error.
2302+
2303+
The root cgroup is a partition root and all its available CPUs
2304+
are in its exclusive CPU set.
2305+
2306+
cpuset.cpus.exclusive.effective
2307+
A read-only multiple values file which exists on all non-root
2308+
cpuset-enabled cgroups.
2309+
2310+
This file shows the effective set of exclusive CPUs that
2311+
can be used to create a partition root. The content of this
2312+
file will always be a subset of "cpuset.cpus" and its parent's
2313+
"cpuset.cpus.exclusive.effective" if its parent is not the root
2314+
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
2315+
if it is set. If "cpuset.cpus.exclusive" is not set, it is
2316+
treated to have an implicit value of "cpuset.cpus" in the
2317+
formation of local partition.
2318+
22692319
cpuset.cpus.partition
22702320
A read-write single value file which exists on non-root
22712321
cpuset-enabled cgroups. This flag is owned by the parent cgroup
@@ -2279,26 +2329,41 @@ Cpuset Interface Files
22792329
"isolated" Partition root without load balancing
22802330
========== =====================================
22812331

2282-
The root cgroup is always a partition root and its state
2283-
cannot be changed. All other non-root cgroups start out as
2284-
"member".
2332+
A cpuset partition is a collection of cpuset-enabled cgroups with
2333+
a partition root at the top of the hierarchy and its descendants
2334+
except those that are separate partition roots themselves and
2335+
their descendants. A partition has exclusive access to the
2336+
set of exclusive CPUs allocated to it. Other cgroups outside
2337+
of that partition cannot use any CPUs in that set.
2338+
2339+
There are two types of partitions - local and remote. A local
2340+
partition is one whose parent cgroup is also a valid partition
2341+
root. A remote partition is one whose parent cgroup is not a
2342+
valid partition root itself. Writing to "cpuset.cpus.exclusive"
2343+
is optional for the creation of a local partition as its
2344+
"cpuset.cpus.exclusive" file will assume an implicit value that
2345+
is the same as "cpuset.cpus" if it is not set. Writing the
2346+
proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
2347+
before the target partition root is mandatory for the creation
2348+
of a remote partition.
2349+
2350+
Currently, a remote partition cannot be created under a local
2351+
partition. All the ancestors of a remote partition root except
2352+
the root cgroup cannot be a partition root.
2353+
2354+
The root cgroup is always a partition root and its state cannot
2355+
be changed. All other non-root cgroups start out as "member".
22852356

22862357
When set to "root", the current cgroup is the root of a new
2287-
partition or scheduling domain that comprises itself and all
2288-
its descendants except those that are separate partition roots
2289-
themselves and their descendants.
2358+
partition or scheduling domain. The set of exclusive CPUs is
2359+
determined by the value of its "cpuset.cpus.exclusive.effective".
22902360

2291-
When set to "isolated", the CPUs in that partition root will
2361+
When set to "isolated", the CPUs in that partition will
22922362
be in an isolated state without any load balancing from the
22932363
scheduler. Tasks placed in such a partition with multiple
22942364
CPUs should be carefully distributed and bound to each of the
22952365
individual CPUs for optimal performance.
22962366

2297-
The value shown in "cpuset.cpus.effective" of a partition root
2298-
is the CPUs that the partition root can dedicate to a potential
2299-
new child partition root. The new child subtracts available
2300-
CPUs from its parent "cpuset.cpus.effective".
2301-
23022367
A partition root ("root" or "isolated") can be in one of the
23032368
two possible states - valid or invalid. An invalid partition
23042369
root is in a degraded state where some state information may
@@ -2321,37 +2386,33 @@ Cpuset Interface Files
23212386
In the case of an invalid partition root, a descriptive string on
23222387
why the partition is invalid is included within parentheses.
23232388

2324-
For a partition root to become valid, the following conditions
2389+
For a local partition root to be valid, the following conditions
23252390
must be met.
23262391

2327-
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
2328-
are not shared by any of its siblings (exclusivity rule).
2329-
2) The parent cgroup is a valid partition root.
2330-
3) The "cpuset.cpus" is not empty and must contain at least
2331-
one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
2332-
4) The "cpuset.cpus.effective" cannot be empty unless there is
2392+
1) The parent cgroup is a valid partition root.
2393+
2) The "cpuset.cpus.exclusive.effective" file cannot be empty,
2394+
though it may contain offline CPUs.
2395+
3) The "cpuset.cpus.effective" cannot be empty unless there is
23332396
no task associated with this partition.
23342397

2335-
External events like hotplug or changes to "cpuset.cpus" can
2336-
cause a valid partition root to become invalid and vice versa.
2337-
Note that a task cannot be moved to a cgroup with empty
2338-
"cpuset.cpus.effective".
2398+
For a remote partition root to be valid, all the above conditions
2399+
except the first one must be met.
23392400

2340-
For a valid partition root with the sibling cpu exclusivity
2341-
rule enabled, changes made to "cpuset.cpus" that violate the
2342-
exclusivity rule will invalidate the partition as well as its
2343-
sibling partitions with conflicting cpuset.cpus values. So
2344-
care must be taking in changing "cpuset.cpus".
2401+
External events like hotplug or changes to "cpuset.cpus" or
2402+
"cpuset.cpus.exclusive" can cause a valid partition root to
2403+
become invalid and vice versa. Note that a task cannot be
2404+
moved to a cgroup with empty "cpuset.cpus.effective".
23452405

23462406
A valid non-root parent partition may distribute out all its CPUs
2347-
to its child partitions when there is no task associated with it.
2407+
to its child local partitions when there is no task associated
2408+
with it.
23482409

2349-
Care must be taken to change a valid partition root to
2350-
"member" as all its child partitions, if present, will become
2410+
Care must be taken to change a valid partition root to "member"
2411+
as all its child local partitions, if present, will become
23512412
invalid causing disruption to tasks running in those child
23522413
partitions. These inactivated partitions could be recovered if
23532414
their parent is switched back to a partition root with a proper
2354-
set of "cpuset.cpus".
2415+
value in "cpuset.cpus" or "cpuset.cpus.exclusive".
23552416

23562417
Poll and inotify events are triggered whenever the state of
23572418
"cpuset.cpus.partition" changes. That includes changes caused
@@ -2361,6 +2422,11 @@ Cpuset Interface Files
23612422
to "cpuset.cpus.partition" without the need to do continuous
23622423
polling.
23632424

2425+
A user can pre-configure certain CPUs to an isolated state
2426+
with load balancing disabled at boot time with the "isolcpus"
2427+
kernel boot command line option. If those CPUs are to be put
2428+
into a partition, they have to be used in an isolated partition.
2429+
23642430

23652431
Device controller
23662432
-----------------

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,10 @@
580580
named mounts. Specifying both "all" and "named" disables
581581
all v1 hierarchies.
582582

583+
cgroup_favordynmods= [KNL] Enable or Disable favordynmods.
584+
Format: { "true" | "false" }
585+
Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS.
586+
583587
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
584588
Format: <string>
585589
nosocket -- Disable socket memory accounting.

kernel/cgroup/cgroup.c

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ static u16 have_exit_callback __read_mostly;
207207
static u16 have_release_callback __read_mostly;
208208
static u16 have_canfork_callback __read_mostly;
209209

210+
static bool have_favordynmods __ro_after_init = IS_ENABLED(CONFIG_CGROUP_FAVOR_DYNMODS);
211+
210212
/* cgroup namespace for init task */
211213
struct cgroup_namespace init_cgroup_ns = {
212214
.ns.count = REFCOUNT_INIT(2),
@@ -1350,7 +1352,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
13501352
cgroup_root_count--;
13511353
}
13521354

1353-
cgroup_favor_dynmods(root, false);
1355+
if (!have_favordynmods)
1356+
cgroup_favor_dynmods(root, false);
1357+
13541358
cgroup_exit_root_id(root);
13551359

13561360
cgroup_unlock();
@@ -1719,20 +1723,22 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
17191723

17201724
if (!css->ss) {
17211725
if (cgroup_on_dfl(cgrp)) {
1722-
ret = cgroup_addrm_files(&cgrp->self, cgrp,
1726+
ret = cgroup_addrm_files(css, cgrp,
17231727
cgroup_base_files, true);
17241728
if (ret < 0)
17251729
return ret;
17261730

17271731
if (cgroup_psi_enabled()) {
1728-
ret = cgroup_addrm_files(&cgrp->self, cgrp,
1732+
ret = cgroup_addrm_files(css, cgrp,
17291733
cgroup_psi_files, true);
17301734
if (ret < 0)
17311735
return ret;
17321736
}
17331737
} else {
1734-
cgroup_addrm_files(css, cgrp,
1735-
cgroup1_base_files, true);
1738+
ret = cgroup_addrm_files(css, cgrp,
1739+
cgroup1_base_files, true);
1740+
if (ret < 0)
1741+
return ret;
17361742
}
17371743
} else {
17381744
list_for_each_entry(cfts, &css->ss->cfts, node) {
@@ -2255,9 +2261,9 @@ static int cgroup_init_fs_context(struct fs_context *fc)
22552261
fc->user_ns = get_user_ns(ctx->ns->user_ns);
22562262
fc->global = true;
22572263

2258-
#ifdef CONFIG_CGROUP_FAVOR_DYNMODS
2259-
ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS;
2260-
#endif
2264+
if (have_favordynmods)
2265+
ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS;
2266+
22612267
return 0;
22622268
}
22632269

@@ -6139,7 +6145,7 @@ int __init cgroup_init(void)
61396145

61406146
if (cgroup1_ssid_disabled(ssid))
61416147
pr_info("Disabling %s control group subsystem in v1 mounts\n",
6142-
ss->name);
6148+
ss->legacy_name);
61436149

61446150
cgrp_dfl_root.subsys_mask |= 1 << ss->id;
61456151

@@ -6782,6 +6788,12 @@ static int __init enable_cgroup_debug(char *str)
67826788
}
67836789
__setup("cgroup_debug", enable_cgroup_debug);
67846790

6791+
static int __init cgroup_favordynmods_setup(char *str)
6792+
{
6793+
return (kstrtobool(str, &have_favordynmods) == 0);
6794+
}
6795+
__setup("cgroup_favordynmods=", cgroup_favordynmods_setup);
6796+
67856797
/**
67866798
* css_tryget_online_from_dir - get corresponding css from a cgroup dentry
67876799
* @dentry: directory dentry of interest

0 commit comments

Comments
 (0)