Skip to content

Commit caab0df

Browse files
authored
Domain.build docs: Improve notes on NUMA node_affinity, move to new page (#6302)
Improve the remarks on NUMA node_affinity and move them into a dedicated walk-though. - Also add a diagram for xc_domain_node_setaffinity().
2 parents 183c467 + 5abc2fa commit caab0df

File tree

4 files changed

+110
-17
lines changed

4 files changed

+110
-17
lines changed

doc/content/lib/_index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: Libraries
3+
hidden: true
4+
---
5+
{{% children description=true %}}

doc/content/lib/xenctrl/_index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: libxenctrl
3+
description: Xen Control library for controlling the Xen hypervisor
4+
---
5+
{{% children description=true %}}
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: xc_domain_node_setaffinity()
3+
description: Set a Xen domain's NUMA node affinity
4+
---
5+
6+
`xc_domain_node_setaffinity()` controls the NUMA node affinity of a domain.
7+
8+
By default, Xen enables the `auto_node_affinity` feature flag,
9+
where setting the vCPU affinity also sets the NUMA node affinity for
10+
memory allocations to be aligned with the vCPU affinity of the domain.
11+
12+
Setting the NUMA node affinity using this call can be used,
13+
for example, when there might not be enough memory on the
14+
preferred NUMA node, but there are other NUMA nodes that have
15+
enough free memory to be used for the system memory of the domain.
16+
17+
In terms of future NUMA design, it might be even more favourable to
18+
have a strategy in `xenguest` where in such cases, the superpages
19+
of the preferred node are used first and a fallback to neighbouring
20+
NUMA nodes only happens to the extent necessary.
21+
22+
Likely, the future allocation strategy should be passed to `xenguest`
23+
using Xenstore like the other platform parameters for the VM.
24+
25+
## Walk-through of xc_domain_node_setaffinity()
26+
27+
```mermaid
28+
classDiagram
29+
class `xc_domain_node_setaffinity()` {
30+
+xch: xc_interface #42;
31+
+domid: uint32_t
32+
+nodemap: xc_nodemap_t
33+
0(on success)
34+
-EINVAL(if a node in the nodemask is not online)
35+
}
36+
click `xc_domain_node_setaffinity()` href "
37+
https://github.com/xen-project/xen/blob/master/tools/libs/ctrl/xc_domain.c#L122-L158"
38+
39+
`xc_domain_node_setaffinity()` --> `Xen hypercall: do_domctl()`
40+
`xc_domain_node_setaffinity()` <-- `Xen hypercall: do_domctl()`
41+
class `Xen hypercall: do_domctl()` {
42+
Calls domain_set_node_affinity#40;#41; and returns its return value
43+
Passes: domain (struct domain *, looked up using the domid)
44+
Passes: new_affinity (modemask, converted from xc_nodemap_t)
45+
}
46+
click `Xen hypercall: do_domctl()` href "
47+
https://github.com/xen-project/xen/blob/master/xen/common/domctl.c#L516-L525"
48+
49+
`Xen hypercall: do_domctl()` --> `domain_set_node_affinity()`
50+
`Xen hypercall: do_domctl()` <-- `domain_set_node_affinity()`
51+
class `domain_set_node_affinity()` {
52+
domain: struct domain
53+
new_affinity: nodemask
54+
0(on success, the domain's node_affinity is updated)
55+
-EINVAL(if a node in the nodemask is not online)
56+
}
57+
click `domain_set_node_affinity()` href "
58+
https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"
59+
```
60+
61+
### domain_set_node_affinity()
62+
63+
This function implements the functionality of `xc_domain_node_setaffinity`
64+
to set the NUMA affinity of a domain as described above.
65+
If the new_affinity does not intersect the `node_online_map`,
66+
it returns `-EINVAL`, otherwise on success `0`.
67+
68+
When the `new_affinity` is a specific set of NUMA nodes, it updates the NUMA
69+
`node_affinity` of the domain to these nodes and disables `auto_node_affinity`
70+
for this domain. It also notifies the Xen scheduler of the change.
71+
72+
This sets the preference the memory allocator to the new NUMA nodes,
73+
and in theory, it could also alter the behaviour of the scheduler.
74+
This of course depends on the scheduler and its configuration.
75+
76+
## Notes on future design improvements
77+
78+
This call cannot influence the past: The `xenopsd`
79+
[VM_create](../../xenopsd/walkthroughs/VM.start.md#2-create-a-xen-domain)
80+
micro-ops calls `Xenctrl.domain_create`. It currently creates
81+
the domain's data structures before `numa_placement` was done.
82+
83+
Improving `Xenctrl.domain_create` to pass a NUMA node
84+
for allocating the Hypervisor's data structures (e.g. vCPU)
85+
of the domain would require changes
86+
to the Xen hypervisor and the `xenopsd`
87+
[xenopsd VM_create](../../xenopsd/walkthroughs/VM.start.md#2-create-a-xen-domain)
88+
micro-op.

doc/content/xenopsd/walkthroughs/VM.build/Domain.build.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -111,23 +111,6 @@ setting the vCPU affinity causes the Xen hypervisor to activate
111111
NUMA node affinity for memory allocations to be aligned with
112112
the vCPU affinity of the domain.
113113

114-
Note: See the Xen domain's
115-
[auto_node_affinity](https://wiki.xenproject.org/wiki/NUMA_node_affinity_in_the_Xen_hypervisor)
116-
feature flag, which controls this, which can be overridden in the
117-
Xen hypervisor if needed for specific VMs.
118-
119-
This can be used, for example, when there might not be enough memory
120-
on the preferred NUMA node, but there are other NUMA nodes that have
121-
enough free memory among with the memory allocations shall be done.
122-
123-
In terms of future NUMA design, it might be even more favourable to
124-
have a strategy in `xenguest` where in such cases, the superpages
125-
of the preferred node are used first and a fallback to neighbouring
126-
NUMA nodes only happens to the extent necessary.
127-
128-
Likely, the future allocation strategy should be passed to `xenguest`
129-
using Xenstore like the other platform parameters for the VM.
130-
131114
Summary: This passes the information to the hypervisor that memory
132115
allocation for this domain should preferably be done from this NUMA node.
133116

@@ -136,3 +119,15 @@ allocation for this domain should preferably be done from this NUMA node.
136119
With the preparation in `build_pre` completed, `Domain.build`
137120
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1127-L1155)
138121
the `xenguest` function to invoke the [xenguest](xenguest) program to build the domain.
122+
123+
## Notes on future design improvements
124+
125+
The Xen domain feature flag
126+
[domain->auto_node_affinity](https://wiki.xenproject.org/wiki/NUMA_node_affinity_in_the_Xen_hypervisor)
127+
can be disabled by calling
128+
[xc_domain_node_setaffinity()](../../references/xc_domain_node_setaffinity.md)
129+
to set a specific NUMA node affinity in special cases:
130+
131+
This can be used, for example, when there might not be enough memory on the preferred
132+
NUMA node, and there are other NUMA nodes (in the same CPU package) to use
133+
([reference](../../../lib/xenctrl/xc_domain_node_setaffinity.md)).

0 commit comments

Comments
 (0)