Skip to content

Commit 7fd4da9

Browse files
Waiman-Longhtejun
authored andcommitted
cgroup/cpuset: Optimize cpuset_attach() on v2
It was found that with the default hierarchy, enabling cpuset in the child cgroups can trigger a cpuset_attach() call in each of the child cgroups that have tasks with no change in effective cpus and mems. If there are many processes in those child cgroups, it will burn quite a lot of cpu cycles iterating all the tasks without doing useful work. Optimizing this case by comparing between the old and new cpusets and skip useless update if there is no change in effective cpus and mems. Also mems_allowed are less likely to be changed than cpus_allowed. So skip changing mm if there is no change in effective_mems and CS_MEMORY_MIGRATE is not set. By inserting some instrumentation code and running a simple command in a container 200 times in a cgroup v2 system, it was found that all the cpuset_attach() calls are skipped (401 times in total) as there was no change in effective cpus and mems. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
1 parent 18f9a4d commit 7fd4da9

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

kernel/cgroup/cpuset.c

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2513,12 +2513,28 @@ static void cpuset_attach(struct cgroup_taskset *tset)
25132513
struct cgroup_subsys_state *css;
25142514
struct cpuset *cs;
25152515
struct cpuset *oldcs = cpuset_attach_old_cs;
2516+
bool cpus_updated, mems_updated;
25162517

25172518
cgroup_taskset_first(tset, &css);
25182519
cs = css_cs(css);
25192520

25202521
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
25212522
percpu_down_write(&cpuset_rwsem);
2523+
cpus_updated = !cpumask_equal(cs->effective_cpus,
2524+
oldcs->effective_cpus);
2525+
mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
2526+
2527+
/*
2528+
* In the default hierarchy, enabling cpuset in the child cgroups
2529+
* will trigger a number of cpuset_attach() calls with no change
2530+
* in effective cpus and mems. In that case, we can optimize out
2531+
* by skipping the task iteration and update.
2532+
*/
2533+
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
2534+
!cpus_updated && !mems_updated) {
2535+
cpuset_attach_nodemask_to = cs->effective_mems;
2536+
goto out;
2537+
}
25222538

25232539
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
25242540

@@ -2539,9 +2555,14 @@ static void cpuset_attach(struct cgroup_taskset *tset)
25392555

25402556
/*
25412557
* Change mm for all threadgroup leaders. This is expensive and may
2542-
* sleep and should be moved outside migration path proper.
2558+
* sleep and should be moved outside migration path proper. Skip it
2559+
* if there is no change in effective_mems and CS_MEMORY_MIGRATE is
2560+
* not set.
25432561
*/
25442562
cpuset_attach_nodemask_to = cs->effective_mems;
2563+
if (!is_memory_migrate(cs) && !mems_updated)
2564+
goto out;
2565+
25452566
cgroup_taskset_for_each_leader(leader, css, tset) {
25462567
struct mm_struct *mm = get_task_mm(leader);
25472568

@@ -2564,6 +2585,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
25642585
}
25652586
}
25662587

2588+
out:
25672589
cs->old_mems_allowed = cpuset_attach_nodemask_to;
25682590

25692591
cs->attach_in_progress--;

0 commit comments

Comments
 (0)