Skip to content

Commit f109e77

Browse files
Gregory Pricedavejiang
authored andcommitted
cxl: docs/allocation/reclaim
Document a bit about how reclaim interacts with various CXL configurations. Signed-off-by: Gregory Price <gourry@gourry.net> Link: https://patch.msgid.link/20250512162134.3596150-16-gourry@gourry.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
1 parent 419dc40 commit f109e77

File tree

2 files changed

+52
-0
lines changed

2 files changed

+52
-0
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=======
4+
Reclaim
5+
=======
6+
Another way CXL memory can be utilized *indirectly* is via the reclaim system
7+
in :code:`mm/vmscan.c`. Reclaim is engaged when memory capacity on the system
8+
becomes pressured based on global and cgroup-local `watermark` settings.
9+
10+
In this section we won't discuss the `watermark` configurations, just how CXL
11+
memory can be consumed by various pieces of reclaim system.
12+
13+
Demotion
14+
========
15+
By default, the reclaim system will prefer swap (or zswap) when reclaiming
16+
memory. Enabling :code:`kernel/mm/numa/demotion_enabled` will cause vmscan
17+
to opportunistically prefer distant NUMA nodes to swap or zswap, if capacity
18+
is available.
19+
20+
Demotion engages the :code:`mm/memory_tier.c` component to determine the
21+
next demotion node. The next demotion node is based on the :code:`HMAT`
22+
or :code:`CDAT` performance data.
23+
24+
cpusets.mems_allowed quirk
25+
--------------------------
26+
In Linux v6.15 and below, demotion does not respect :code:`cpusets.mems_allowed`
27+
when migrating pages. As a result, if demotion is enabled, vmscan cannot
28+
guarantee isolation of a container's memory from nodes not set in mems_allowed.
29+
30+
In Linux v6.XX and up, demotion does attempt to respect
31+
:code:`cpusets.mems_allowed`; however, certain classes of shared memory
32+
originally instantiated by another cgroup (such as common libraries - e.g.
33+
libc) may still be demoted. As a result, the mems_allowed interface still
34+
cannot provide perfect isolation from the remote nodes.
35+
36+
ZSwap and Node Preference
37+
=========================
38+
In Linux v6.15 and below, ZSwap allocates memory from the local node of the
39+
processor for the new pages being compressed. Since pages being compressed
40+
are typically cold, the result is a cold page becomes promoted - only to
41+
be later demoted as it ages off the LRU.
42+
43+
In Linux v6.XX, ZSwap tries to prefer the node of the page being compressed
44+
as the allocation target for the compression page. This helps prevent
45+
thrashing.
46+
47+
Demotion with ZSwap
48+
===================
49+
When enabling both Demotion and ZSwap, you create a situation where ZSwap
50+
will prefer the slowest form of CXL memory by default until that tier of
51+
memory is exhausted.

Documentation/driver-api/cxl/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,5 +46,6 @@ that have impacts on each other. The docs here break up configurations steps.
4646

4747
allocation/dax
4848
allocation/page-allocator
49+
allocation/reclaim
4950

5051
.. only:: subproject and html

0 commit comments

Comments
 (0)