Skip to content

Commit 126310c

Browse files
kudureranganathgregkh
authored andcommitted
drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug
While building the shared_cpu_map, check if the cache level and cache type matches. On certain systems that build the cache topology based on the instance ID, there are cases where the same ID may repeat across multiple cache levels, leading inaccurate topology. In event of CPU offlining, the cache_shared_cpu_map_remove() does not consider if IDs at same level are being compared. As a result, when same IDs repeat across different cache levels, the CPU going offline is not removed from all the shared_cpu_map. Below is the output of cache topology of CPU8 and it's SMT sibling after CPU8 is offlined on a dual socket 3rd Generation AMD EPYC processor (2 x 64C/128T) running kernel release v6.3: # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 # echo 0 > /sys/devices/system/cpu/cpu8/online # for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136 /sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143 CPU8 is removed from index0 (L1i) but remains in the shared_cpu_list of index1 (L1d) and index2 (L2). Since L1i, L1d, and L2 are shared by the SMT siblings, and they have the same cache instance ID, CPU 2 is only removed from the first index with matching ID which is index1 (L1i) in this case. With this fix, the results are as expected when performing the same experiment on the same system: # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 # echo 0 > /sys/devices/system/cpu/cpu8/online # for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136 /sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 136 /sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 136 /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143 When rebuilding topology, the same problem appears as cache_shared_cpu_map_setup() implements a similar logic. Consider the same 3rd Generation EPYC processor: CPUs in Core 1, that share the L1 and L2 caches, have L1 and L2 instance ID as 1. For all the CPUs on the second chiplet, the L3 ID is also 1 leading to grouping on CPUs from Core 1 (1, 17) and the entire second chiplet (8-15, 24-31) as CPUs sharing one cache domain. This went undetected since x86 processors depended on arch specific populate_cache_leaves() method to repopulate the shared_cpus_map when CPU came back online until kernel release v6.3-rc5. Fixes: 198102c ("cacheinfo: Fix shared_cpu_map to handle shared caches at different levels") Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20230508084115.1157-2-kprateek.nayak@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent 7877cb9 commit 126310c

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

drivers/base/cacheinfo.c

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,16 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
388388
continue;/* skip if itself or no cacheinfo */
389389
for (sib_index = 0; sib_index < cache_leaves(i); sib_index++) {
390390
sib_leaf = per_cpu_cacheinfo_idx(i, sib_index);
391+
392+
/*
393+
* Comparing cache IDs only makes sense if the leaves
394+
* belong to the same cache level of same type. Skip
395+
* the check if level and type do not match.
396+
*/
397+
if (sib_leaf->level != this_leaf->level ||
398+
sib_leaf->type != this_leaf->type)
399+
continue;
400+
391401
if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
392402
cpumask_set_cpu(cpu, &sib_leaf->shared_cpu_map);
393403
cpumask_set_cpu(i, &this_leaf->shared_cpu_map);
@@ -419,6 +429,16 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
419429

420430
for (sib_index = 0; sib_index < cache_leaves(sibling); sib_index++) {
421431
sib_leaf = per_cpu_cacheinfo_idx(sibling, sib_index);
432+
433+
/*
434+
* Comparing cache IDs only makes sense if the leaves
435+
* belong to the same cache level of same type. Skip
436+
* the check if level and type do not match.
437+
*/
438+
if (sib_leaf->level != this_leaf->level ||
439+
sib_leaf->type != this_leaf->type)
440+
continue;
441+
422442
if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
423443
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
424444
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);

0 commit comments

Comments
 (0)