Skip to content

Commit 36e9f71

Browse files
Gregory Pricedavejiang
authored andcommitted
cxl: docs/linux/dax-driver documentation
Add documentation on how the CXL driver interacts with the DAX driver. Signed-off-by: Gregory Price <gourry@gourry.net> Link: https://patch.msgid.link/20250512162134.3596150-12-gourry@gourry.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
1 parent ef3a43a commit 36e9f71

File tree

3 files changed

+149
-10
lines changed

3 files changed

+149
-10
lines changed

Documentation/driver-api/cxl/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ that have impacts on each other. The docs here break up configurations steps.
3636
linux/overview
3737
linux/early-boot
3838
linux/cxl-driver
39+
linux/dax-driver
3940
linux/access-coordinates
4041

4142

Documentation/driver-api/cxl/linux/cxl-driver.rst

Lines changed: 105 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,40 @@ into a single memory region. The memory region has been converted to dax. ::
3434
decoder1.0 decoder5.0 endpoint5 port1 region0
3535
decoder2.0 decoder5.1 endpoint6 port2 root0
3636

37+
38+
.. kernel-render:: DOT
39+
:alt: Digraph of CXL fabric describing host-bridge interleaving
40+
:caption: Diagraph of CXL fabric with a host-bridge interleave memory region
41+
42+
digraph foo {
43+
"root0" -> "port1";
44+
"root0" -> "port3";
45+
"root0" -> "decoder0.0";
46+
"port1" -> "endpoint5";
47+
"port3" -> "endpoint6";
48+
"port1" -> "decoder1.0";
49+
"port3" -> "decoder3.0";
50+
"endpoint5" -> "decoder5.0";
51+
"endpoint6" -> "decoder6.0";
52+
"decoder0.0" -> "region0";
53+
"decoder0.0" -> "decoder1.0";
54+
"decoder0.0" -> "decoder3.0";
55+
"decoder1.0" -> "decoder5.0";
56+
"decoder3.0" -> "decoder6.0";
57+
"decoder5.0" -> "region0";
58+
"decoder6.0" -> "region0";
59+
"region0" -> "dax_region0";
60+
"dax_region0" -> "dax0.0";
61+
}
62+
3763
For this section we'll explore the devices present in this configuration, but
3864
we'll explore more configurations in-depth in example configurations below.
3965

4066
Base Devices
4167
------------
4268
Most devices in a CXL fabric are a `port` of some kind (because each
4369
device mostly routes request from one device to the next, rather than
44-
provide a manageable service).
70+
provide a direct service).
4571

4672
Root
4773
~~~~
@@ -53,6 +79,8 @@ The Root contains links to:
5379

5480
* `Host Bridge Ports` defined by ACPI CEDT CHBS.
5581

82+
* `Downstream Ports` typically connected to `Host Bridge Ports`.
83+
5684
* `Root Decoders` defined by ACPI CEDT CFMWS.
5785

5886
::
@@ -150,6 +178,27 @@ device configuration data. ::
150178
driver label_storage_size pmem serial
151179
firmware numa_node ram subsystem
152180

181+
A Memory Device is a discrete base object that is not a port. While the
182+
physical device it belongs to may also host an `endpoint`, the relationship
183+
between an `endpoint` and a `memdev` is not captured in sysfs.
184+
185+
Port Relationships
186+
~~~~~~~~~~~~~~~~~~
187+
In our example described above, there are four host bridges attached to the
188+
root, and two of the host bridges have one endpoint attached.
189+
190+
.. kernel-render:: DOT
191+
:alt: Digraph of CXL fabric describing host-bridge interleaving
192+
:caption: Diagraph of CXL fabric with a host-bridge interleave memory region
193+
194+
digraph foo {
195+
"root0" -> "port1";
196+
"root0" -> "port2";
197+
"root0" -> "port3";
198+
"root0" -> "port4";
199+
"port1" -> "endpoint5";
200+
"port3" -> "endpoint6";
201+
}
153202

154203
Decoders
155204
--------
@@ -322,6 +371,29 @@ settings (granularity and ways must be the same).
322371
Endpoint decoders are created during :code:`cxl_endpoint_port_probe` in the
323372
:code:`cxl_port` driver, and is created based on a PCI device's DVSEC registers.
324373

374+
Decoder Relationships
375+
~~~~~~~~~~~~~~~~~~~~~
376+
In our example described above, there is one root decoder which routes memory
377+
accesses over two host bridges. Each host bridge has a decoder which routes
378+
access to their singular endpoint targets. Each endpoint has a decoder which
379+
translates HPA to DPA and services the memory request.
380+
381+
The driver validates relationships between ports by decoder programming, so
382+
we can think of decoders being related in a similarly hierarchical fashion to
383+
ports.
384+
385+
.. kernel-render:: DOT
386+
:alt: Digraph of hierarchical relationship between root, switch, and endpoint decoders.
387+
:caption: Diagraph of CXL root, switch, and endpoint decoders.
388+
389+
digraph foo {
390+
"root0" -> "decoder0.0";
391+
"decoder0.0" -> "decoder1.0";
392+
"decoder0.0" -> "decoder3.0";
393+
"decoder1.0" -> "decoder5.0";
394+
"decoder3.0" -> "decoder6.0";
395+
}
396+
325397
Regions
326398
-------
327399

@@ -348,6 +420,17 @@ The interleave settings in a `Memory Region` describe the configuration of the
348420
`Interleave Set` - and are what can be expected to be seen in the endpoint
349421
interleave settings.
350422

423+
.. kernel-render:: DOT
424+
:alt: Digraph of CXL memory region relationships between root and endpoint decoders.
425+
:caption: Regions are created based on root decoder configurations. Endpoint decoders
426+
must be programmed with the same interleave settings as the region.
427+
428+
digraph foo {
429+
"root0" -> "decoder0.0";
430+
"decoder0.0" -> "region0";
431+
"region0" -> "decoder5.0";
432+
"region0" -> "decoder6.0";
433+
}
351434

352435
DAX Region
353436
~~~~~~~~~~
@@ -360,7 +443,6 @@ for more details. ::
360443
dax0.0 devtype modalias uevent
361444
dax_region driver subsystem
362445

363-
364446
Mailbox Interfaces
365447
------------------
366448
A mailbox command interface for each device is exposed in ::
@@ -418,17 +500,30 @@ the relationships between a decoder and it's parent.
418500

419501
For example, in a `Cross-Link First` interleave setup with 16 endpoints
420502
attached to 4 host bridges, linux expects the following ways/granularity
421-
across the root, host bridge, and endpoints respectively. ::
503+
across the root, host bridge, and endpoints respectively.
504+
505+
.. flat-table:: 4x4 cross-link first interleave settings
506+
507+
* - decoder
508+
- ways
509+
- granularity
422510

423-
ways granularity
424-
root 4 256
425-
host bridge 4 1024
426-
endpoint 16 256
511+
* - root
512+
- 4
513+
- 256
514+
515+
* - host bridge
516+
- 4
517+
- 1024
518+
519+
* - endpoint
520+
- 16
521+
- 256
427522

428523
At the root, every a given access will be routed to the
429524
:code:`((HPA / 256) % 4)th` target host bridge. Within a host bridge, every
430-
:code:`((HPA / 1024) % 4)th` target endpoint. Each endpoint will translate
431-
the access based on the entire 16 device interleave set.
525+
:code:`((HPA / 1024) % 4)th` target endpoint. Each endpoint translates based
526+
on the entire 16 device interleave set.
432527

433528
Unbalanced interleave sets are not supported - decoders at a similar point
434529
in the hierarchy (e.g. all host bridge decoders) must have the same ways and
@@ -467,7 +562,7 @@ In this example, the CFMWS defines two discrete non-interleaved 4GB regions
467562
for each host bridge, and one interleaved 8GB region that targets both. This
468563
would result in 3 root decoders presenting in the root. ::
469564

470-
# ls /sys/bus/cxl/devices/root0
565+
# ls /sys/bus/cxl/devices/root0/decoder*
471566
decoder0.0 decoder0.1 decoder0.2
472567

473568
# cat /sys/bus/cxl/devices/decoder0.0/target_list start size
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
====================
4+
DAX Driver Operation
5+
====================
6+
The `Direct Access Device` driver was originally designed to provide a
7+
memory-like access mechanism to memory-like block-devices. It was
8+
extended to support CXL Memory Devices, which provide user-configured
9+
memory devices.
10+
11+
The CXL subsystem depends on the DAX subsystem to either:
12+
13+
- Generate a file-like interface to userland via :code:`/dev/daxN.Y`, or
14+
- Engage the memory-hotplug interface to add CXL memory to page allocator.
15+
16+
The DAX subsystem exposes this ability through the `cxl_dax_region` driver.
17+
A `dax_region` provides the translation between a CXL `memory_region` and
18+
a `DAX Device`.
19+
20+
DAX Device
21+
==========
22+
A `DAX Device` is a file-like interface exposed in :code:`/dev/daxN.Y`. A
23+
memory region exposed via dax device can be accessed via userland software
24+
via the :code:`mmap()` system-call. The result is direct mappings to the
25+
CXL capacity in the task's page tables.
26+
27+
Users wishing to manually handle allocation of CXL memory should use this
28+
interface.
29+
30+
kmem conversion
31+
===============
32+
The :code:`dax_kmem` driver converts a `DAX Device` into a series of `hotplug
33+
memory blocks` managed by :code:`kernel/memory-hotplug.c`. This capacity
34+
will be exposed to the kernel page allocator in the user-selected memory
35+
zone.
36+
37+
The :code:`memmap_on_memory` setting (both global and DAX device local)
38+
dictates where the kernell will allocate the :code:`struct folio` descriptors
39+
for this memory will come from. If :code:`memmap_on_memory` is set, memory
40+
hotplug will set aside a portion of the memory block capacity to allocate
41+
folios. If unset, the memory is allocated via a normal :code:`GFP_KERNEL`
42+
allocation - and as a result will most likely land on the local NUM node of the
43+
CPU executing the hotplug operation.

0 commit comments

Comments
 (0)