Skip to content

(docs) Describe the flows of setting NUMA node affinity in Xen by xenopsd #6335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 69 additions & 11 deletions doc/content/lib/xenctrl/xc_domain_node_setaffinity.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,32 @@
---
title: xc_domain_node_setaffinity()
description: Set a Xen domain's NUMA node affinity
description: Set a Xen domain's NUMA node affinity for memory allocations
mermaid:
force: true
---

`xc_domain_node_setaffinity()` controls the NUMA node affinity of a domain.
`xc_domain_node_setaffinity()` controls the NUMA node affinity of a domain,
but it only updates the Xen hypervisor domain's `d->node_affinity` mask.
This mask is read by the Xen memory allocator as the 2nd preference for the
NUMA node to allocate memory from for this domain.

By default, Xen enables the `auto_node_affinity` feature flag,
where setting the vCPU affinity also sets the NUMA node affinity for
memory allocations to be aligned with the vCPU affinity of the domain.
> [!info] Preferences of the Xen memory allocator:
> 1. A NUMA node passed to the allocator directly takes precedence, if present.
> 2. Then, if the allocation is for a domain, it's `node_affinity` mask is tried.
> 3. Finally, it falls back to spread the pages over all remaining NUMA nodes.

As this call has no practical effect on the Xen scheduler, vCPU affinities
need to be set separately anyways.

The domain's `auto_node_affinity` flag is enabled by default by Xen. This means
that when setting vCPU affinities, Xen updates the `d->node_affinity` mask
to consist of the NUMA nodes to which its vCPUs have affinity to.

See [xc_vcpu_setaffinity()](xc_vcpu_setaffinity) for more information
on how `d->auto_node_affinity` is used to set the NUMA node affinity.

Thus, so far, there is no obvious need to call `xc_domain_node_setaffinity()`
when building a domain.

Setting the NUMA node affinity using this call can be used,
for example, when there might not be enough memory on the
Expand Down Expand Up @@ -63,18 +82,57 @@ https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"
This function implements the functionality of `xc_domain_node_setaffinity`
to set the NUMA affinity of a domain as described above.
If the new_affinity does not intersect the `node_online_map`,
it returns `-EINVAL`, otherwise on success `0`.
it returns `-EINVAL`. Otherwise, the result is a success, and it returns `0`.

When the `new_affinity` is a specific set of NUMA nodes, it updates the NUMA
`node_affinity` of the domain to these nodes and disables `auto_node_affinity`
for this domain. It also notifies the Xen scheduler of the change.
`node_affinity` of the domain to these nodes and disables `d->auto_node_affinity`
for this domain. With `d->auto_node_affinity` disabled,
[xc_vcpu_setaffinity()](xc_vcpu_setaffinity) no longer updates the NUMA affinity
of this domain.

If `new_affinity` has all bits set, it re-enables the `d->auto_node_affinity`
for this domain and calls
[domain_update_node_aff()](https://github.com/xen-project/xen/blob/e16acd80/xen/common/sched/core.c#L1809-L1876)
to re-set the domain's `node_affinity` mask to the NUMA nodes of the current
the hard and soft affinity of the domain's online vCPUs.

### Flowchart in relation to xc_set_vcpu_affinity()

The effect of `domain_set_node_affinity()` can be seen more clearly on this
flowchart which shows how `xc_set_vcpu_affinity()` is currently used to set
the NUMA affinity of a new domain, but also shows how `domain_set_node_affinity()`
relates to it:

This sets the preference the memory allocator to the new NUMA nodes,
and in theory, it could also alter the behaviour of the scheduler.
This of course depends on the scheduler and its configuration.
{{% include "xc_vcpu_setaffinity-xenopsd-notes.md" %}}
{{% include "xc_vcpu_setaffinity-xenopsd.md" %}}

`xc_domain_node_setaffinity` can be used to set the domain's `node_affinity`
(which is normally set by `xc_set_vcpu_affinity`) to different NUMA nodes.

#### No effect on the Xen scheduler

Currently, the node affinity does not affect the Xen scheudler:
In case `d->node_affinity` would be set before vCPU creation, the initial pCPU
of the new vCPU is the first pCPU of the first NUMA node in the domain's
`node_affinity`. This is further changed when one of more `cpupools` are set up.
As this is only the initial pCPU of the vCPU, this alone does not change the
scheduling of Xen Credit scheduler as it reschedules the vCPUs to other pCPUs.

## Notes on future design improvements

### It may be possible to call it before vCPUs are created

When done early, before vCPU creation, some domain-related data structures
could be allocated using the domain's `d->node_affinity` NUMA node mask.

With further changes in Xen and `xenopsd`, Xen could allocate the vCPU structs
on the affine NUMA nodes of the domain.

For this, would be that `xenopsd` would have to call `xc_domain_node_setaffinity()`
before vCPU creation, after having decided the domain's NUMA placement,
preferably including claiming the required memory for the domain to ensure
that the domain will be populated from the same NUMA node(s).

This call cannot influence the past: The `xenopsd`
[VM_create](../../xenopsd/walkthroughs/VM.start.md#2-create-a-xen-domain)
micro-ops calls `Xenctrl.domain_create`. It currently creates
Expand Down
30 changes: 30 additions & 0 deletions doc/content/lib/xenctrl/xc_vcpu_setaffinity-simplified.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Simplified flowchart of xc_vcpu_setaffinity()
description: See lib/xenctrl/xc_vcpu_setaffinity-xenopsd.md for an extended version
hidden: true
---
```mermaid
flowchart TD
subgraph libxenctrl
xc_vcpu_setaffinity("<tt>xc_vcpu_setaffinity()")--hypercall-->xen
end
subgraph xen[Xen Hypervisor]
direction LR
vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity")
-->check_auto_node{"Is the domain's<br><tt>auto_node_affinity</tt><br>enabled?"}
--"yes<br>(default)"-->
auto_node_affinity("Set the<br>domain's<br><tt>node_affinity</tt>
mask as well<br>(used for further<br>NUMA memory<br>allocation)")

click xc_vcpu_setaffinity
"https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank
click vcpu_set_affinity
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank
click domain_update_node_aff
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank
click check_auto_node
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank
click auto_node_affinity
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
end
```
13 changes: 13 additions & 0 deletions doc/content/lib/xenctrl/xc_vcpu_setaffinity-xenopsd-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Notes for the flowchart on the use of setaffinity for VM.start
hidden: true
---
In the flowchart, two code paths are set in bold:
- Show the path when `Host.numa_affinity_policy` is the default (off) in `xenopsd`.
- Show the default path of `xc_vcpu_setaffinity(XEN_VCPUAFFINITY_SOFT)` in Xen,
when the Domain's `auto_node_affinity` flag is enabled (default) to show
how it changes to the vCPU affinity update the domain's `node_affinity`
in this default case as well.

[xenguest](../../xenopsd/walkthroughs/VM.build/xenguest/) uses the Xenstore
to read the static domain configuration that it needs reads to build the domain.
176 changes: 176 additions & 0 deletions doc/content/lib/xenctrl/xc_vcpu_setaffinity-xenopsd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: Flowchart of the use of xc_vcpu_setaffinity() by xenopsd
description: Shows how xenopsd uses xc_vcpu_setaffinity() to set NUMA affinity
hidden: true
---
```mermaid
flowchart TD

subgraph VM.create["xenopsd VM.create"]

%% Is xe vCPU-params:mask= set? If yes, write to Xenstore:

is_xe_vCPUparams_mask_set?{"

Is
<tt>xe vCPU-params:mask=</tt>
set? Example: <tt>1,2,3</tt>
(Is used to enable vCPU<br>hard-affinity)

"} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore:
<tt>platform/vcpu/#domid/affinity</tt>
(xenguest will read this and other configuration data
from Xenstore)")

end

subgraph VM.build["xenopsd VM.build"]

%% Labels of the decision nodes

is_Host.numa_affinity_policy_set?{
Is<p><tt>Host.numa_affinity_policy</tt><p>set?}
has_hard_affinity?{
Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?}

%% Connections from VM.create:
set_hard_affinity --> is_Host.numa_affinity_policy_set?
is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set?

%% The Subgraph itself:

%% Check Host.numa_affinity_policy

is_Host.numa_affinity_policy_set?

%% If Host.numa_affinity_policy is "best_effort":

-- Host.numa_affinity_policy is<p><tt>best_effort -->

%% If has_hard_affinity is set, skip numa_placement:

has_hard_affinity?
--"yes"-->exec_xenguest

%% If has_hard_affinity is not set, run numa_placement:

has_hard_affinity?
--"no"-->numa_placement-->exec_xenguest

%% If Host.numa_affinity_policy is off (default, for now),
%% skip NUMA placement:

is_Host.numa_affinity_policy_set?
=="default: disabled"==>
exec_xenguest
end

%% xenguest subgraph

subgraph xenguest

exec_xenguest

==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()")

==> configure_vcpus("<tT>configure_vcpus()")

%% Decision
==> set_hard_affinity?{"
Is <tt>platform/<br>vcpu/#domid/affinity</tt>
set?"}

end

%% do_domctl Hypercalls

numa_placement
--Set the NUMA placement using soft-affinity-->
XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)")
==> do_domctl

set_hard_affinity?
--yes-->
XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)")
--> do_domctl

xc_domain_node_setaffinity("<tt>xc_domain_node_setaffinity()</tt>
and
<tt>xc_domain_node_getaffinity()")
<--> do_domctl

%% Xen subgraph

subgraph xen[Xen Hypervisor]

subgraph domain_update_node_affinity["domain_update_node_affinity()"]
domain_update_node_aff("<tt>domain_update_node_aff()")
==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"}
=="yes (default)"==>set_node_affinity_from_vcpu_affinities("
Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity
(used for further NUMA memory allocation for the domain)")
end

do_domctl{"do_domctl()<br>op->cmd=?"}
==XEN_DOMCTL_setvcpuaffinity==>
vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity")
==>domain_update_node_aff
do_domctl
--XEN_DOMCTL_setnodeaffinity (not used currently)
-->is_new_affinity_all_nodes?

subgraph domain_set_node_affinity["domain_set_node_affinity()"]

is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?}

--is #34;all#34;

--> enable_auto_node_affinity("<tt>auto_node_affinity=1")
--> domain_update_node_aff

is_new_affinity_all_nodes?

--not #34;all#34;

--> disable_auto_node_affinity("<tt>auto_node_affinity=0")
--> domain_update_node_aff
end

%% setting and getting the struct domain's node_affinity:

disable_auto_node_affinity
--node_affinity=new_affinity-->
domain_node_affinity

set_node_affinity_from_vcpu_affinities
==> domain_node_affinity@{ shape: bow-rect,label: "domain:&nbsp;node_affinity" }
--XEN_DOMCTL_getnodeaffinity--> do_domctl

end
click is_Host.numa_affinity_policy_set?
"https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962"
click numa_placement
"https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897"
click stub_xc_hvm_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank
click get_flags
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank
click do_domctl
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank
click domain_set_node_affinity
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank
click configure_vcpus
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank
click set_hard_affinity?
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank
click xc_vcpu_setaffinity
"https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank
click vcpu_set_affinity
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank
click domain_update_node_aff
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank
click check_auto_node
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank
click set_node_affinity_from_vcpu_affinities
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
```
Loading
Loading