Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit b84b338

Browse files
committed
Merge tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov: - Enable Sub-NUMA clustering to work with resource control on Intel by teaching resctrl to handle scopes due to the clustering which partitions the L3 cache into sets. Modify and extend the subsystem to handle such scopes properly * tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Update documentation with Sub-NUMA cluster changes x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems x86/resctrl: Make __mon_event_count() handle sum domains x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Allocate a new field in union mon_data_bits x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function x86/resctrl: Initialize on-stack struct rmid_read instances x86/resctrl: Add a new field to struct rmid_read for summation of domains x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems x86/resctrl: Introduce snc_nodes_per_l3_cache x86/resctrl: Add node-scope to the options for feature scope x86/resctrl: Split the rdt_domain and rdt_hw_domain structures x86/resctrl: Prepare for different scope for control/monitor operations x86/resctrl: Prepare to split rdt_domain structure x86/resctrl: Prepare for new domain scope
2 parents d679783 + ea34999 commit b84b338

File tree

9 files changed

+855
-323
lines changed

9 files changed

+855
-323
lines changed

Documentation/arch/x86/resctrl.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,10 @@ When monitoring is enabled all MON groups will also contain:
375375
all tasks in the group. In CTRL_MON groups these files provide
376376
the sum for all tasks in the CTRL_MON group and all tasks in
377377
MON groups. Please see example section for more details on usage.
378+
On systems with Sub-NUMA Cluster (SNC) enabled there are extra
379+
directories for each node (located within the "mon_L3_XX" directory
380+
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
381+
where "YY" is the node number.
378382

379383
"mon_hw_id":
380384
Available only with debug option. The identifier used by hardware
@@ -484,6 +488,29 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
484488
each bit represents 5% of the capacity of the cache. You could partition
485489
the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
486490

491+
Notes on Sub-NUMA Cluster mode
492+
==============================
493+
When SNC mode is enabled, Linux may load balance tasks between Sub-NUMA
494+
nodes much more readily than between regular NUMA nodes since the CPUs
495+
on Sub-NUMA nodes share the same L3 cache and the system may report
496+
the NUMA distance between Sub-NUMA nodes with a lower value than used
497+
for regular NUMA nodes.
498+
499+
The top-level monitoring files in each "mon_L3_XX" directory provide
500+
the sum of data across all SNC nodes sharing an L3 cache instance.
501+
Users who bind tasks to the CPUs of a specific Sub-NUMA node can read
502+
the "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes" in the
503+
"mon_sub_L3_YY" directories to get node local data.
504+
505+
Memory bandwidth allocation is still performed at the L3 cache
506+
level. I.e. throttling controls are applied to all SNC nodes.
507+
508+
L3 cache allocation bitmaps also apply to all SNC nodes. But note that
509+
the amount of L3 cache represented by each bit is divided by the number
510+
of SNC nodes per L3 cache. E.g. with a 100MB cache on a system with 10-bit
511+
allocation masks each bit normally represents 10MB. With SNC mode enabled
512+
with two SNC nodes per L3 cache, each bit only represents 5MB.
513+
487514
Memory bandwidth Allocation and monitoring
488515
==========================================
489516

arch/x86/include/asm/msr-index.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,6 +1164,7 @@
11641164
#define MSR_IA32_QM_CTR 0xc8e
11651165
#define MSR_IA32_PQR_ASSOC 0xc8f
11661166
#define MSR_IA32_L3_CBM_BASE 0xc90
1167+
#define MSR_RMID_SNC_CONFIG 0xca0
11671168
#define MSR_IA32_L2_CBM_BASE 0xd10
11681169
#define MSR_IA32_MBA_THRTL_BASE 0xd50
11691170

0 commit comments

Comments
 (0)