Skip to content

L3 cache topology information on intel chip with numa architecture is not accurate #402

@LavenderQAQ

Description

@LavenderQAQ

I ran the topology example on an INTEL(R) XEON(R) GOLD 6542Y, but it didn't seem to match the results of lscpu -e.

The following is the cpu information

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        4
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel(R) Corporation
CPU family:          6
Model:               207
Model name:          INTEL(R) XEON(R) GOLD 6542Y
BIOS Model name:     INTEL(R) XEON(R) GOLD 6542Y
Stepping:            2
CPU MHz:             3564.866
CPU max MHz:         2901.0000
CPU min MHz:         800.0000
BogoMIPS:            5800.00
L1d cache:           48K
L1i cache:           32K
L2 cache:            2048K
L3 cache:            61440K
NUMA node0 CPU(s):   0-11,48-59
NUMA node1 CPU(s):   12-23,60-71
NUMA node2 CPU(s):   24-35,72-83
NUMA node3 CPU(s):   36-47,84-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hfi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm uintr md_clear serialize tsxldtrk pconfig arch_lbr amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities

Results for the topology of ghw

topology NUMA (4 nodes)
 node #0 (12 cores)
  L1i cache (32 KB) shared with logical processors: 0,48
  L1i cache (32 KB) shared with logical processors: 1,49
  L1i cache (32 KB) shared with logical processors: 2,50
  L1i cache (32 KB) shared with logical processors: 3,51
  L1i cache (32 KB) shared with logical processors: 4,52
  L1i cache (32 KB) shared with logical processors: 5,53
  L1i cache (32 KB) shared with logical processors: 6,54
  L1i cache (32 KB) shared with logical processors: 7,55
  L1i cache (32 KB) shared with logical processors: 8,56
  L1i cache (32 KB) shared with logical processors: 9,57
  L1i cache (32 KB) shared with logical processors: 10,58
  L1i cache (32 KB) shared with logical processors: 11,59
  L1d cache (32 KB) shared with logical processors: 0,48
  L1d cache (32 KB) shared with logical processors: 1,49
  L1d cache (32 KB) shared with logical processors: 2,50
  L1d cache (32 KB) shared with logical processors: 3,51
  L1d cache (32 KB) shared with logical processors: 4,52
  L1d cache (32 KB) shared with logical processors: 5,53
  L1d cache (32 KB) shared with logical processors: 6,54
  L1d cache (32 KB) shared with logical processors: 7,55
  L1d cache (32 KB) shared with logical processors: 8,56
  L1d cache (32 KB) shared with logical processors: 9,57
  L1d cache (32 KB) shared with logical processors: 10,58
  L1d cache (32 KB) shared with logical processors: 11,59
  L2 cache (2048 KB) shared with logical processors: 0,48
  L2 cache (2048 KB) shared with logical processors: 1,49
  L2 cache (2048 KB) shared with logical processors: 2,50
  L2 cache (2048 KB) shared with logical processors: 3,51
  L2 cache (2048 KB) shared with logical processors: 4,52
  L2 cache (2048 KB) shared with logical processors: 5,53
  L2 cache (2048 KB) shared with logical processors: 6,54
  L2 cache (2048 KB) shared with logical processors: 7,55
  L2 cache (2048 KB) shared with logical processors: 8,56
  L2 cache (2048 KB) shared with logical processors: 9,57
  L2 cache (2048 KB) shared with logical processors: 10,58
  L2 cache (2048 KB) shared with logical processors: 11,59
  L3 cache (61440 KB) shared with logical processors: 0,1,2,3,4,5,6,7,8,9,10,11,48,49,50,51,52,53,54,55,56,57,58,59
 node #1 (12 cores)
  L1i cache (32 KB) shared with logical processors: 12,60
  L1i cache (32 KB) shared with logical processors: 13,61
  L1i cache (32 KB) shared with logical processors: 14,62
  L1i cache (32 KB) shared with logical processors: 15,63
  L1i cache (32 KB) shared with logical processors: 16,64
  L1i cache (32 KB) shared with logical processors: 17,65
  L1i cache (32 KB) shared with logical processors: 18,66
  L1i cache (32 KB) shared with logical processors: 19,67
  L1i cache (32 KB) shared with logical processors: 20,68
  L1i cache (32 KB) shared with logical processors: 21,69
  L1i cache (32 KB) shared with logical processors: 22,70
  L1i cache (32 KB) shared with logical processors: 23,71
  L1d cache (32 KB) shared with logical processors: 12,60
  L1d cache (32 KB) shared with logical processors: 13,61
  L1d cache (32 KB) shared with logical processors: 14,62
  L1d cache (32 KB) shared with logical processors: 15,63
  L1d cache (32 KB) shared with logical processors: 16,64
  L1d cache (32 KB) shared with logical processors: 17,65
  L1d cache (32 KB) shared with logical processors: 18,66
  L1d cache (32 KB) shared with logical processors: 19,67
  L1d cache (32 KB) shared with logical processors: 20,68
  L1d cache (32 KB) shared with logical processors: 21,69
  L1d cache (32 KB) shared with logical processors: 22,70
  L1d cache (32 KB) shared with logical processors: 23,71
  L2 cache (2048 KB) shared with logical processors: 12,60
  L2 cache (2048 KB) shared with logical processors: 13,61
  L2 cache (2048 KB) shared with logical processors: 14,62
  L2 cache (2048 KB) shared with logical processors: 15,63
  L2 cache (2048 KB) shared with logical processors: 16,64
  L2 cache (2048 KB) shared with logical processors: 17,65
  L2 cache (2048 KB) shared with logical processors: 18,66
  L2 cache (2048 KB) shared with logical processors: 19,67
  L2 cache (2048 KB) shared with logical processors: 20,68
  L2 cache (2048 KB) shared with logical processors: 21,69
  L2 cache (2048 KB) shared with logical processors: 22,70
  L2 cache (2048 KB) shared with logical processors: 23,71
  L3 cache (61440 KB) shared with logical processors: 12,13,14,15,16,17,18,19,20,21,22,23,60,61,62,63,64,65,66,67,68,69,70,71
 node #2 (12 cores)
  L1i cache (32 KB) shared with logical processors: 24,72
  L1i cache (32 KB) shared with logical processors: 25,73
  L1i cache (32 KB) shared with logical processors: 26,74
  L1i cache (32 KB) shared with logical processors: 27,75
  L1i cache (32 KB) shared with logical processors: 28,76
  L1i cache (32 KB) shared with logical processors: 29,77
  L1i cache (32 KB) shared with logical processors: 30,78
  L1i cache (32 KB) shared with logical processors: 31,79
  L1i cache (32 KB) shared with logical processors: 32,80
  L1i cache (32 KB) shared with logical processors: 33,81
  L1i cache (32 KB) shared with logical processors: 34,82
  L1i cache (32 KB) shared with logical processors: 35,83
  L1d cache (32 KB) shared with logical processors: 24,72
  L1d cache (32 KB) shared with logical processors: 25,73
  L1d cache (32 KB) shared with logical processors: 26,74
  L1d cache (32 KB) shared with logical processors: 27,75
  L1d cache (32 KB) shared with logical processors: 28,76
  L1d cache (32 KB) shared with logical processors: 29,77
  L1d cache (32 KB) shared with logical processors: 30,78
  L1d cache (32 KB) shared with logical processors: 31,79
  L1d cache (32 KB) shared with logical processors: 32,80
  L1d cache (32 KB) shared with logical processors: 33,81
  L1d cache (32 KB) shared with logical processors: 34,82
  L1d cache (32 KB) shared with logical processors: 35,83
  L2 cache (2048 KB) shared with logical processors: 24,72
  L2 cache (2048 KB) shared with logical processors: 25,73
  L2 cache (2048 KB) shared with logical processors: 26,74
  L2 cache (2048 KB) shared with logical processors: 27,75
  L2 cache (2048 KB) shared with logical processors: 28,76
  L2 cache (2048 KB) shared with logical processors: 29,77
  L2 cache (2048 KB) shared with logical processors: 30,78
  L2 cache (2048 KB) shared with logical processors: 31,79
  L2 cache (2048 KB) shared with logical processors: 32,80
  L2 cache (2048 KB) shared with logical processors: 33,81
  L2 cache (2048 KB) shared with logical processors: 34,82
  L2 cache (2048 KB) shared with logical processors: 35,83
  L3 cache (61440 KB) shared with logical processors: 24,25,26,27,28,29,30,31,32,33,34,35,72,73,74,75,76,77,78,79,80,81,82,83
 node #3 (12 cores)
  L1i cache (32 KB) shared with logical processors: 36,84
  L1i cache (32 KB) shared with logical processors: 37,85
  L1i cache (32 KB) shared with logical processors: 38,86
  L1i cache (32 KB) shared with logical processors: 39,87
  L1i cache (32 KB) shared with logical processors: 40,88
  L1i cache (32 KB) shared with logical processors: 41,89
  L1i cache (32 KB) shared with logical processors: 42,90
  L1i cache (32 KB) shared with logical processors: 43,91
  L1i cache (32 KB) shared with logical processors: 44,92
  L1i cache (32 KB) shared with logical processors: 45,93
  L1i cache (32 KB) shared with logical processors: 46,94
  L1i cache (32 KB) shared with logical processors: 47,95
  L1d cache (32 KB) shared with logical processors: 36,84
  L1d cache (32 KB) shared with logical processors: 37,85
  L1d cache (32 KB) shared with logical processors: 38,86
  L1d cache (32 KB) shared with logical processors: 39,87
  L1d cache (32 KB) shared with logical processors: 40,88
  L1d cache (32 KB) shared with logical processors: 41,89
  L1d cache (32 KB) shared with logical processors: 42,90
  L1d cache (32 KB) shared with logical processors: 43,91
  L1d cache (32 KB) shared with logical processors: 44,92
  L1d cache (32 KB) shared with logical processors: 45,93
  L1d cache (32 KB) shared with logical processors: 46,94
  L1d cache (32 KB) shared with logical processors: 47,95
  L2 cache (2048 KB) shared with logical processors: 36,84
  L2 cache (2048 KB) shared with logical processors: 37,85
  L2 cache (2048 KB) shared with logical processors: 38,86
  L2 cache (2048 KB) shared with logical processors: 39,87
  L2 cache (2048 KB) shared with logical processors: 40,88
  L2 cache (2048 KB) shared with logical processors: 41,89
  L2 cache (2048 KB) shared with logical processors: 42,90
  L2 cache (2048 KB) shared with logical processors: 43,91
  L2 cache (2048 KB) shared with logical processors: 44,92
  L2 cache (2048 KB) shared with logical processors: 45,93
  L2 cache (2048 KB) shared with logical processors: 46,94
  L2 cache (2048 KB) shared with logical processors: 47,95
  L3 cache (61440 KB) shared with logical processors: 36,37,38,39,40,41,42,43,44,45,46,47,84,85,86,87,88,89,90,91,92,93,94,95

The result of lscpu -e, which is also the same as the result of executing cat /sys/devices/system/cpu/cpu<x>/cache/index3/id

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    2901.0000 800.0000
1   0    0      1    1:1:1:0       yes    2901.0000 800.0000
2   0    0      2    2:2:2:0       yes    2901.0000 800.0000
3   0    0      3    3:3:3:0       yes    2901.0000 800.0000
4   0    0      4    4:4:4:0       yes    2901.0000 800.0000
5   0    0      5    5:5:5:0       yes    2901.0000 800.0000
6   0    0      6    6:6:6:0       yes    2901.0000 800.0000
7   0    0      7    7:7:7:0       yes    2901.0000 800.0000
8   0    0      8    8:8:8:0       yes    2901.0000 800.0000
9   0    0      9    9:9:9:0       yes    2901.0000 800.0000
10  0    0      10   10:10:10:0    yes    2901.0000 800.0000
11  0    0      11   11:11:11:0    yes    2901.0000 800.0000
12  1    0      12   12:12:12:0    yes    2901.0000 800.0000
13  1    0      13   13:13:13:0    yes    2901.0000 800.0000
14  1    0      14   14:14:14:0    yes    2901.0000 800.0000
15  1    0      15   15:15:15:0    yes    2901.0000 800.0000
16  1    0      16   16:16:16:0    yes    2901.0000 800.0000
17  1    0      17   17:17:17:0    yes    2901.0000 800.0000
18  1    0      18   18:18:18:0    yes    2901.0000 800.0000
19  1    0      19   19:19:19:0    yes    2901.0000 800.0000
20  1    0      20   20:20:20:0    yes    2901.0000 800.0000
21  1    0      21   21:21:21:0    yes    2901.0000 800.0000
22  1    0      22   22:22:22:0    yes    2901.0000 800.0000
23  1    0      23   23:23:23:0    yes    2901.0000 800.0000
24  2    1      24   24:24:24:1    yes    2901.0000 800.0000
25  2    1      25   25:25:25:1    yes    2901.0000 800.0000
26  2    1      26   26:26:26:1    yes    2901.0000 800.0000
27  2    1      27   27:27:27:1    yes    2901.0000 800.0000
28  2    1      28   28:28:28:1    yes    2901.0000 800.0000
29  2    1      29   29:29:29:1    yes    2901.0000 800.0000
30  2    1      30   30:30:30:1    yes    2901.0000 800.0000
31  2    1      31   31:31:31:1    yes    2901.0000 800.0000
32  2    1      32   32:32:32:1    yes    2901.0000 800.0000
33  2    1      33   33:33:33:1    yes    2901.0000 800.0000
34  2    1      34   34:34:34:1    yes    2901.0000 800.0000
35  2    1      35   35:35:35:1    yes    2901.0000 800.0000
36  3    1      36   36:36:36:1    yes    2901.0000 800.0000
37  3    1      37   37:37:37:1    yes    2901.0000 800.0000
38  3    1      38   38:38:38:1    yes    2901.0000 800.0000
39  3    1      39   39:39:39:1    yes    2901.0000 800.0000
40  3    1      40   40:40:40:1    yes    2901.0000 800.0000
41  3    1      41   41:41:41:1    yes    2901.0000 800.0000
42  3    1      42   42:42:42:1    yes    2901.0000 800.0000
43  3    1      43   43:43:43:1    yes    2901.0000 800.0000
44  3    1      44   44:44:44:1    yes    2901.0000 800.0000
45  3    1      45   45:45:45:1    yes    2901.0000 800.0000
46  3    1      46   46:46:46:1    yes    2901.0000 800.0000
47  3    1      47   47:47:47:1    yes    2901.0000 800.0000
48  0    0      0    0:0:0:0       yes    2901.0000 800.0000
49  0    0      1    1:1:1:0       yes    2901.0000 800.0000
50  0    0      2    2:2:2:0       yes    2901.0000 800.0000
51  0    0      3    3:3:3:0       yes    2901.0000 800.0000
52  0    0      4    4:4:4:0       yes    2901.0000 800.0000
53  0    0      5    5:5:5:0       yes    2901.0000 800.0000
54  0    0      6    6:6:6:0       yes    2901.0000 800.0000
55  0    0      7    7:7:7:0       yes    2901.0000 800.0000
56  0    0      8    8:8:8:0       yes    2901.0000 800.0000
57  0    0      9    9:9:9:0       yes    2901.0000 800.0000
58  0    0      10   10:10:10:0    yes    2901.0000 800.0000
59  0    0      11   11:11:11:0    yes    2901.0000 800.0000
60  1    0      12   12:12:12:0    yes    2901.0000 800.0000
61  1    0      13   13:13:13:0    yes    2901.0000 800.0000
62  1    0      14   14:14:14:0    yes    2901.0000 800.0000
63  1    0      15   15:15:15:0    yes    2901.0000 800.0000
64  1    0      16   16:16:16:0    yes    2901.0000 800.0000
65  1    0      17   17:17:17:0    yes    2901.0000 800.0000
66  1    0      18   18:18:18:0    yes    2901.0000 800.0000
67  1    0      19   19:19:19:0    yes    2901.0000 800.0000
68  1    0      20   20:20:20:0    yes    2901.0000 800.0000
69  1    0      21   21:21:21:0    yes    2901.0000 800.0000
70  1    0      22   22:22:22:0    yes    2901.0000 800.0000
71  1    0      23   23:23:23:0    yes    2901.0000 800.0000
72  2    1      24   24:24:24:1    yes    2901.0000 800.0000
73  2    1      25   25:25:25:1    yes    2901.0000 800.0000
74  2    1      26   26:26:26:1    yes    2901.0000 800.0000
75  2    1      27   27:27:27:1    yes    2901.0000 800.0000
76  2    1      28   28:28:28:1    yes    2901.0000 800.0000
77  2    1      29   29:29:29:1    yes    2901.0000 800.0000
78  2    1      30   30:30:30:1    yes    2901.0000 800.0000
79  2    1      31   31:31:31:1    yes    2901.0000 800.0000
80  2    1      32   32:32:32:1    yes    2901.0000 800.0000
81  2    1      33   33:33:33:1    yes    2901.0000 800.0000
82  2    1      34   34:34:34:1    yes    2901.0000 800.0000
83  2    1      35   35:35:35:1    yes    2901.0000 800.0000
84  3    1      36   36:36:36:1    yes    2901.0000 800.0000
85  3    1      37   37:37:37:1    yes    2901.0000 800.0000
86  3    1      38   38:38:38:1    yes    2901.0000 800.0000
87  3    1      39   39:39:39:1    yes    2901.0000 800.0000
88  3    1      40   40:40:40:1    yes    2901.0000 800.0000
89  3    1      41   41:41:41:1    yes    2901.0000 800.0000
90  3    1      42   42:42:42:1    yes    2901.0000 800.0000
91  3    1      43   43:43:43:1    yes    2901.0000 800.0000
92  3    1      44   44:44:44:1    yes    2901.0000 800.0000
93  3    1      45   45:45:45:1    yes    2901.0000 800.0000
94  3    1      46   46:46:46:1    yes    2901.0000 800.0000
95  3    1      47   47:47:47:1    yes    2901.0000 800.0000

It looks like this is happening because the code only executes the cache judgment logic within the node itself. Could we solve this problem by adding an id to the cache type and adding a merge logic?

        // Inspect the caches for each logical processor. There will be a
	// /sys/devices/system/node/nodeX/cpuX/cache directory containing a
	// number of directories beginning with the prefix "index" followed by
	// a number. The number indicates the level of the cache, which
	// indicates the "distance" from the processor. Each of these
	// directories contains information about the size of that level of
	// cache and the processors mapped to it.
	cachePath := filepath.Join(cpuPath, "cache")
	if _, err = os.Stat(cachePath); errors.Is(err, os.ErrNotExist) {
		continue
	}
	cacheDirFiles, err := os.ReadDir(cachePath)
	if err != nil {
		return nil, err
	}
	for _, cacheDirFile := range cacheDirFiles {
		cacheDirFileName := cacheDirFile.Name()
		if !strings.HasPrefix(cacheDirFileName, "index") {
			continue
		}
		cacheIndex, _ := strconv.Atoi(cacheDirFileName[5:])

		// The cache information is repeated for each node, so here, we
		// just ensure that we only have a one Cache object for each
		// unique combination of level, type and processor map
		level := memoryCacheLevel(ctx, paths, nodeID, lpID, cacheIndex)
		cacheType := memoryCacheType(ctx, paths, nodeID, lpID, cacheIndex)
		sharedCpuMap := memoryCacheSharedCPUMap(ctx, paths, nodeID, lpID, cacheIndex)
		cacheKey := fmt.Sprintf("%d-%d-%s", level, cacheType, sharedCpuMap)

		cache, exists := caches[cacheKey]
		if !exists {
			size := memoryCacheSize(ctx, paths, nodeID, lpID, level)
			cache = &Cache{
				Level:             uint8(level),
				Type:              cacheType,
				SizeBytes:         uint64(size) * uint64(unitutil.KB),
				LogicalProcessors: make([]uint32, 0),
			}
			caches[cacheKey] = cache
		}
		cache.LogicalProcessors = append(
			cache.LogicalProcessors,
			uint32(lpID),
		)
	}

The present results give the illusion that four L3 caches are present on the cpu, which does not feel particularly reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions