Skip to content

Hugo docs: Add dedicated walk-throughs for VM.build and xenguest #6296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions doc/content/xenopsd/walkthroughs/VM.build/Domain.build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: Domain.build
description:
"Prepare the build of a VM: Wait for scrubbing, do NUMA placement, run xenguest."
---

## Overview

```mermaid
flowchart LR
subgraph xenopsd VM_build[
xenopsd thread pool with two VM_build micro#8209;ops:
During parallel VM_start, Many threads run this in parallel!
]
direction LR
build_domain_exn[
VM.build_domain_exn
from thread pool Thread #1
] --> Domain.build
Domain.build --> build_pre
build_pre --> wait_xen_free_mem
build_pre -->|if NUMA/Best_effort| numa_placement
Domain.build --> xenguest[Invoke xenguest]
click Domain.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank
click build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank
click wait_xen_free_mem "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank
click numa_placement "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank
click build_pre "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank
click xenguest "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank

build_domain_exn2[
VM.build_domain_exn
from thread pool Thread #2] --> Domain.build2[Domain.build]
Domain.build2 --> build_pre2[build_pre]
build_pre2 --> wait_xen_free_mem2[wait_xen_free_mem]
build_pre2 -->|if NUMA/Best_effort| numa_placement2[numa_placement]
Domain.build2 --> xenguest2[Invoke xenguest]
click Domain.build2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank
click build_domain_exn2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank
click wait_xen_free_mem2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank
click numa_placement2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank
click build_pre2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank
click xenguest2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank
end
```

[`VM.build_domain_exn`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248)
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225)
[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210)
to call:
- `build_pre` to prepare the build of a VM:
- If the `xe` config `numa_placement` is set to `Best_effort`, invoke the NUMA placement algorithm.
- Run `xenguest`
- `xenguest` to invoke the [xenguest](xenguest) program to setup the domain's system memory.

## Domain Build Preparation using build_pre

[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210)
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1137)
the [function `build_pre`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964)
(which is also used for VM restore). It must:

1. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L902-L911)
[wait_xen_free_mem](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272)
to wait, if necessary, for the Xen memory scrubber to catch up reclaiming memory (CA-39743)
2. Call the hypercall to set the timer mode
3. Call the hypercall to set the number of vCPUs
4. As described in the [NUMA feature description](../../toolstack/features/NUMA),
when the `xe` configuration option `numa_placement` is set to `Best_effort`,
except when the VM has a hard affinity set, invoke the `numa_placement` function:

```ml
match !Xenops_server.numa_placement with
| Any ->
()
| Best_effort ->
log_reraise (Printf.sprintf "NUMA placement") (fun () ->
if has_hard_affinity then
D.debug "VM has hard affinity set, skipping NUMA optimization"
else
numa_placement domid ~vcpus
~memory:(Int64.mul memory.xen_max_mib 1048576L)
)
```

## NUMA placement

`build_pre` passes the `domid`, the number of `vCPUs` and `xen_max_mib` to the
[numa_placement](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897)
function to run the algorithm to find the best NUMA placement.

When it returns a NUMA node to use, it calls the Xen hypercalls
to set the vCPU affinity to this NUMA node:

```ml
let vm = NUMARequest.make ~memory ~vcpus in
let nodea =
match !numa_resources with
| None ->
Array.of_list nodes
| Some a ->
Array.map2 NUMAResource.min_memory (Array.of_list nodes) a
in
numa_resources := Some nodea ;
Softaffinity.plan ~vm host nodea
```

By using the default `auto_node_affinity` feature of Xen,
setting the vCPU affinity causes the Xen hypervisor to activate
NUMA node affinity for memory allocations to be aligned with
the vCPU affinity of the domain.

Note: See the Xen domain's
[auto_node_affinity](https://wiki.xenproject.org/wiki/NUMA_node_affinity_in_the_Xen_hypervisor)
feature flag, which controls this, which can be overridden in the
Xen hypervisor if needed for specific VMs.

This can be used, for example, when there might not be enough memory
on the preferred NUMA node, but there are other NUMA nodes that have
enough free memory among with the memory allocations shall be done.

In terms of future NUMA design, it might be even more favourable to
have a strategy in `xenguest` where in such cases, the superpages
of the preferred node are used first and a fallback to neighbouring
NUMA nodes only happens to the extent necessary.

Likely, the future allocation strategy should be passed to `xenguest`
using Xenstore like the other platform parameters for the VM.

Summary: This passes the information to the hypervisor that memory
allocation for this domain should preferably be done from this NUMA node.

## Invoke the xenguest program

With the preparation in `build_pre` completed, `Domain.build`
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1127-L1155)
the `xenguest` function to invoke the [xenguest](xenguest) program to build the domain.
58 changes: 58 additions & 0 deletions doc/content/xenopsd/walkthroughs/VM.build/VM_build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: VM_build micro-op
linkTitle: VM_build μ-op
description: Overview of the VM_build μ-op (runs after the VM_create μ-op created the domain).
weight: 10
---

## Overview

On Xen, `Xenctrl.domain_create` creates an empty domain and
returns the domain ID (`domid`) of the new domain to `xenopsd`.

In the `build` phase, the `xenguest` program is called to create
the system memory layout of the domain, set vCPU affinity and a
lot more.

The [VM_build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271)
micro-op collects the VM build parameters and calls
[VM.build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291),
which calls
[VM.build_domain](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288),
which calls
[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248)
which calls [Domain.build](Domain.build):

```mermaid
flowchart
subgraph xenopsd VM_build[xenopsd VM_build micro#8209;op]
direction LR
VM_build --> VM.build
VM.build --> VM.build_domain
VM.build_domain --> VM.build_domain_exn
VM.build_domain_exn --> Domain.build
click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank
click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank
click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank
click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank
click Domain.build "../Domain.build/index.html"
end
```

The function
[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024)
must:

1. Run pygrub (or eliloader) to extract the kernel and initrd, if necessary
2. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225)
[Domain.build](Domain.build)
to:
- optionally run NUMA placement and
- invoke [xenguest](VM.build/xenguest) to set up the domain memory.

See the walk-though on [VM.build](VM.build) for more details on this phase.
3. Apply the `cpuid` configuration
4. Store the current domain configuration on disk -- it's important to know
the difference between the configuration you started with and the configuration
you would use after a reboot because some properties (such as maximum memory
and vCPUs) as fixed on create.
24 changes: 24 additions & 0 deletions doc/content/xenopsd/walkthroughs/VM.build/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: Building a VM
description: After VM_create, VM_build builds the core of the domain (vCPUs, memory)
weight: 20
---

Walk-through documents for the `VM_build` phase:

```mermaid
flowchart
subgraph xenopsd VM_build[xenopsd VM_build micro#8209;op]
direction LR
VM_build --> VM.build
VM.build --> VM.build_domain
VM.build_domain --> VM.build_domain_exn
VM.build_domain_exn --> Domain.build
click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank
click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank
click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank
click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank
end
```

{{% children description=true %}}
Loading
Loading