diff --git a/doc/content/xenopsd/walkthroughs/VM.build/Domain.build.md b/doc/content/xenopsd/walkthroughs/VM.build/Domain.build.md new file mode 100644 index 00000000000..8514e13eefd --- /dev/null +++ b/doc/content/xenopsd/walkthroughs/VM.build/Domain.build.md @@ -0,0 +1,137 @@ +--- +title: Domain.build +description: + "Prepare the build of a VM: Wait for scrubbing, do NUMA placement, run xenguest." +--- + +## Overview + +```mermaid +flowchart LR +subgraph xenopsd VM_build[ + xenopsd thread pool with two VM_build micro#8209;ops: + During parallel VM_start, Many threads run this in parallel! +] +direction LR +build_domain_exn[ + VM.build_domain_exn + from thread pool Thread #1 +] --> Domain.build +Domain.build --> build_pre +build_pre --> wait_xen_free_mem +build_pre -->|if NUMA/Best_effort| numa_placement +Domain.build --> xenguest[Invoke xenguest] +click Domain.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank +click build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank +click wait_xen_free_mem "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank +click numa_placement "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank +click build_pre "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank +click xenguest "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank + +build_domain_exn2[ + VM.build_domain_exn + from thread pool Thread #2] --> Domain.build2[Domain.build] +Domain.build2 --> build_pre2[build_pre] +build_pre2 --> wait_xen_free_mem2[wait_xen_free_mem] +build_pre2 -->|if NUMA/Best_effort| numa_placement2[numa_placement] +Domain.build2 --> xenguest2[Invoke xenguest] +click Domain.build2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank +click build_domain_exn2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank +click wait_xen_free_mem2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank +click numa_placement2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank +click build_pre2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank +click xenguest2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank +end +``` + +[`VM.build_domain_exn`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248) +[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225) +[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210) +to call: +- `build_pre` to prepare the build of a VM: + - If the `xe` config `numa_placement` is set to `Best_effort`, invoke the NUMA placement algorithm. + - Run `xenguest` +- `xenguest` to invoke the [xenguest](xenguest) program to setup the domain's system memory. + +## Domain Build Preparation using build_pre + +[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210) +[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1137) +the [function `build_pre`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964) +(which is also used for VM restore). It must: + +1. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L902-L911) + [wait_xen_free_mem](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272) + to wait, if necessary, for the Xen memory scrubber to catch up reclaiming memory (CA-39743) +2. Call the hypercall to set the timer mode +3. Call the hypercall to set the number of vCPUs +4. As described in the [NUMA feature description](../../toolstack/features/NUMA), + when the `xe` configuration option `numa_placement` is set to `Best_effort`, + except when the VM has a hard affinity set, invoke the `numa_placement` function: + + ```ml + match !Xenops_server.numa_placement with + | Any -> + () + | Best_effort -> + log_reraise (Printf.sprintf "NUMA placement") (fun () -> + if has_hard_affinity then + D.debug "VM has hard affinity set, skipping NUMA optimization" + else + numa_placement domid ~vcpus + ~memory:(Int64.mul memory.xen_max_mib 1048576L) + ) + ``` + +## NUMA placement + +`build_pre` passes the `domid`, the number of `vCPUs` and `xen_max_mib` to the +[numa_placement](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897) +function to run the algorithm to find the best NUMA placement. + +When it returns a NUMA node to use, it calls the Xen hypercalls +to set the vCPU affinity to this NUMA node: + +```ml + let vm = NUMARequest.make ~memory ~vcpus in + let nodea = + match !numa_resources with + | None -> + Array.of_list nodes + | Some a -> + Array.map2 NUMAResource.min_memory (Array.of_list nodes) a + in + numa_resources := Some nodea ; + Softaffinity.plan ~vm host nodea +``` + +By using the default `auto_node_affinity` feature of Xen, +setting the vCPU affinity causes the Xen hypervisor to activate +NUMA node affinity for memory allocations to be aligned with +the vCPU affinity of the domain. + +Note: See the Xen domain's +[auto_node_affinity](https://wiki.xenproject.org/wiki/NUMA_node_affinity_in_the_Xen_hypervisor) +feature flag, which controls this, which can be overridden in the +Xen hypervisor if needed for specific VMs. + +This can be used, for example, when there might not be enough memory +on the preferred NUMA node, but there are other NUMA nodes that have +enough free memory among with the memory allocations shall be done. + +In terms of future NUMA design, it might be even more favourable to +have a strategy in `xenguest` where in such cases, the superpages +of the preferred node are used first and a fallback to neighbouring +NUMA nodes only happens to the extent necessary. + +Likely, the future allocation strategy should be passed to `xenguest` +using Xenstore like the other platform parameters for the VM. + +Summary: This passes the information to the hypervisor that memory +allocation for this domain should preferably be done from this NUMA node. + +## Invoke the xenguest program + +With the preparation in `build_pre` completed, `Domain.build` +[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1127-L1155) +the `xenguest` function to invoke the [xenguest](xenguest) program to build the domain. diff --git a/doc/content/xenopsd/walkthroughs/VM.build/VM_build.md b/doc/content/xenopsd/walkthroughs/VM.build/VM_build.md new file mode 100644 index 00000000000..c488d9b7c1c --- /dev/null +++ b/doc/content/xenopsd/walkthroughs/VM.build/VM_build.md @@ -0,0 +1,58 @@ +--- +title: VM_build micro-op +linkTitle: VM_build μ-op +description: Overview of the VM_build μ-op (runs after the VM_create μ-op created the domain). +weight: 10 +--- + +## Overview + +On Xen, `Xenctrl.domain_create` creates an empty domain and +returns the domain ID (`domid`) of the new domain to `xenopsd`. + +In the `build` phase, the `xenguest` program is called to create +the system memory layout of the domain, set vCPU affinity and a +lot more. + +The [VM_build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271) +micro-op collects the VM build parameters and calls +[VM.build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291), +which calls +[VM.build_domain](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288), +which calls +[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248) +which calls [Domain.build](Domain.build): + +```mermaid +flowchart +subgraph xenopsd VM_build[xenopsd VM_build micro#8209;op] +direction LR +VM_build --> VM.build +VM.build --> VM.build_domain +VM.build_domain --> VM.build_domain_exn +VM.build_domain_exn --> Domain.build +click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank +click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank +click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank +click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank +click Domain.build "../Domain.build/index.html" +end +``` + +The function +[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024) +must: + +1. Run pygrub (or eliloader) to extract the kernel and initrd, if necessary +2. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225) + [Domain.build](Domain.build) + to: + - optionally run NUMA placement and + - invoke [xenguest](VM.build/xenguest) to set up the domain memory. + + See the walk-though on [VM.build](VM.build) for more details on this phase. +3. Apply the `cpuid` configuration +4. Store the current domain configuration on disk -- it's important to know + the difference between the configuration you started with and the configuration + you would use after a reboot because some properties (such as maximum memory + and vCPUs) as fixed on create. diff --git a/doc/content/xenopsd/walkthroughs/VM.build/_index.md b/doc/content/xenopsd/walkthroughs/VM.build/_index.md new file mode 100644 index 00000000000..0a5d73d70cf --- /dev/null +++ b/doc/content/xenopsd/walkthroughs/VM.build/_index.md @@ -0,0 +1,24 @@ +--- +title: Building a VM +description: After VM_create, VM_build builds the core of the domain (vCPUs, memory) +weight: 20 +--- + +Walk-through documents for the `VM_build` phase: + +```mermaid +flowchart +subgraph xenopsd VM_build[xenopsd VM_build micro#8209;op] +direction LR +VM_build --> VM.build +VM.build --> VM.build_domain +VM.build_domain --> VM.build_domain_exn +VM.build_domain_exn --> Domain.build +click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank +click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank +click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank +click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank +end +``` + +{{% children description=true %}} diff --git a/doc/content/xenopsd/walkthroughs/VM.build/xenguest.md b/doc/content/xenopsd/walkthroughs/VM.build/xenguest.md new file mode 100644 index 00000000000..66345f018ec --- /dev/null +++ b/doc/content/xenopsd/walkthroughs/VM.build/xenguest.md @@ -0,0 +1,184 @@ +--- +title: xenguest +description: + "Perform building VMs: Allocate and populate the domain's system memory." +--- + +# Flowchart + +xenguest is called as part of starting a new domain in VM_build: + +```mermaid +flowchart +subgraph xenopsd VM_build[xenopsd VM_build micro#8209;ops] +direction LR +xenopsd1[Domain.build - Thread #1] --> xenguest1[xenguest #1] +xenopsd2[Domain.build - Thread #2] --> xenguest2[xenguest #2] +xenguest1 --> libxenguest +xenguest2 --> libxenguest2[libxenguest] +click xenopsd1 "../Domain.build/index.html" +click xenopsd2 "../Domain.build/index.html" +click xenguest1 "https://github.com/xenserver/xen.pg/blob/XS-8/patches/xenguest.patch" _blank +click xenguest2 "https://github.com/xenserver/xen.pg/blob/XS-8/patches/xenguest.patch" _blank +click libxenguest "https://github.com/xen-project/xen/tree/master/tools/libs/guest" _blank +click libxenguest2 "https://github.com/xen-project/xen/tree/master/tools/libs/guest" _blank +libxenguest --> Xen[Xen
Hypervisor] +libxenguest2 --> Xen +end +``` + +# About xenguest +`xenguest` is called by the xenopsd [Domain.build](Domain.build) function +to perform the build phase for new VMs, which is part of the `xenopsd` +[VM.start operation](VM.start). + +[xenguest](https://github.com/xenserver/xen.pg/blob/XS-8/patches/xenguest.patch) +was created as a separate program due to issues with +`libxenguest`: + +- It wasn't threadsafe: fixed, but it still uses a per-call global struct +- It had an incompatible licence, but now licensed under the LGPL. + +Those were fixed, but we still shell out to `xenguest`, which is currently +carried in the patch queue for the Xen hypervisor packages, but could become +an individual package once planned changes to the Xen hypercalls are stabilised. + +Over time, `xenguest` has evolved to build more of the initial domain state. + +# Interface to xenguest + +```mermaid +flowchart +subgraph xenopsd VM_build[xenopsd VM_build micro#8209;op] +direction TB +mode +domid +memmax +Xenstore +end +mode[--mode build_hvm] --> xenguest +domid --> xenguest +memmax --> xenguest +Xenstore[Xenstore platform data] --> xenguest +``` + +`xenopsd` must pass this information to `xenguest` to build a VM: + +- The domain type to build for (HVM, PHV or PV). + - It is passed using the command line option `--mode hvm_build`. +- The `domid` of the created empty domain, +- The amount of system memory of the domain, +- A number of other parameters that are domain-specific. + +Using the Xenstore, the platform data (vCPUs, vCPU affinity, etc) is passed: +- the vCPU affinity +- the vCPU credit2 weight/cap parameters +- whether the NX bit is exposed +- whether the viridian CPUID leaf is exposed +- whether the system has PAE or not +- whether the system has ACPI or not +- whether the system has nested HVM or not +- whether the system has an HPET or not + +When called to build a domain, `xenguest` reads those and builds the VM accordingly. + +## Walkthrough of the xenguest build mode + +```mermaid +flowchart +subgraph xenguest[xenguest #8209;#8209;mode hvm_build domid] +direction LR +stub_xc_hvm_build[stub_xc_hvm_build#40;#41;] --> get_flags[ + get_flags#40;#41; <#8209; Xenstore platform data +] +stub_xc_hvm_build --> configure_vcpus[ + configure_vcpus#40;#41; #8209;> Xen hypercall +] +stub_xc_hvm_build --> setup_mem[ + setup_mem#40;#41; #8209;> Xen hypercalls to setup domain memory +] +end +``` + +Based on the given domain type, the `xenguest` program calls dedicated +functions for the build process of the given domain type. + +These are: + +- `stub_xc_hvm_build()` for HVM, +- `stub_xc_pvh_build()` for PVH, and +- `stub_xc_pv_build()` for PV domains. + +These domain build functions call these functions: + +1. `get_flags()` to get the platform data from the Xenstore +2. `configure_vcpus()` which uses the platform data from the Xenstore to configure vCPU affinity and the credit scheduler parameters vCPU weight and vCPU cap (max % pCPU time for throttling) +3. The `setup_mem` function for the given VM type. + +## The function hvm_build_setup_mem() + +For HVM domains, `hvm_build_setup_mem()` is responsible for deriving the memory +layout of the new domain, allocating the required memory and populating for the +new domain. It must: + +1. Derive the `e820` memory layout of the system memory of the domain + including memory holes depending on PCI passthrough and vGPU flags. +2. Load the BIOS/UEFI firmware images +3. Store the final MMIO hole parameters in the Xenstore +4. Call the `libxenguest` function `xc_dom_boot_mem_init()` (see below) +5. Call `construct_cpuid_policy()` to apply the CPUID `featureset` policy + +## The function xc_dom_boot_mem_init() + +```mermaid +flowchart LR +subgraph xenguest +hvm_build_setup_mem[hvm_build_setup_mem#40;#41;] +end +subgraph libxenguest +hvm_build_setup_mem --> xc_dom_boot_mem_init[xc_dom_boot_mem_init#40;#41;] +xc_dom_boot_mem_init -->|vmemranges| meminit_hvm[meninit_hvm#40;#41;] +click xc_dom_boot_mem_init "https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_boot.c#L110-L126" _blank +click meminit_hvm "https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_x86.c#L1348-L1648" _blank +end +``` + +`hvm_build_setup_mem()` calls +[xc_dom_boot_mem_init()](https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_boot.c#L110-L126) +to allocate and populate the domain's system memory. + +It calls +[meminit_hvm()](https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_x86.c#L1348-L1648) +to loop over the `vmemranges` of the domain for mapping the system RAM +of the guest from the Xen hypervisor heap. Its goals are: + +- Attempt to allocate 1GB superpages when possible +- Fall back to 2MB pages when 1GB allocation failed +- Fall back to 4k pages when both failed + +It uses the hypercall +[XENMEM_populate_physmap](https://github.com/xen-project/xen/blob/39c45c/xen/common/memory.c#L1408-L1477) +to perform memory allocation and to map the allocated memory +to the system RAM ranges of the domain. + +https://github.com/xen-project/xen/blob/39c45c/xen/common/memory.c#L1022-L1071 + +`XENMEM_populate_physmap`: + +1. Uses + [construct_memop_from_reservation](https://github.com/xen-project/xen/blob/39c45c/xen/common/memory.c#L1022-L1071) + to convert the arguments for allocating a page from + [struct xen_memory_reservation](https://github.com/xen-project/xen/blob/master/xen/include/public/memory.h#L46-L80) + to `struct memop_args`. +2. Sets flags and calls functions according to the arguments +3. Allocates the requested page at the most suitable place + - depending on passed flags, allocate on a specific NUMA node + - else, if the domain has node affinity, on the affine nodes + - also in the most suitable memory zone within the NUMA node +4. Falls back to less desirable places if this fails + - or fail for "exact" allocation requests +5. When no pages of the requested size are free, + it splits larger superpages into pages of the requested size. + +For more details on the VM build step involving `xenguest` and Xen side see: +https://wiki.xenproject.org/wiki/Walkthrough:_VM_build_using_xenguest diff --git a/doc/content/xenopsd/walkthroughs/VM.migrate.md b/doc/content/xenopsd/walkthroughs/VM.migrate.md index 080ebdb8edc..e3517ab3f0f 100644 --- a/doc/content/xenopsd/walkthroughs/VM.migrate.md +++ b/doc/content/xenopsd/walkthroughs/VM.migrate.md @@ -1,5 +1,8 @@ --- title: 'Walkthrough: Migrating a VM' +linktitle: 'Migrating a VM' +description: Walkthrough of migrating a VM from one host to another. +weight: 50 --- A XenAPI client wishes to migrate a VM from one host to another within diff --git a/doc/content/xenopsd/walkthroughs/VM.start.md b/doc/content/xenopsd/walkthroughs/VM.start.md index 52201fd7218..6a12e9d9c60 100644 --- a/doc/content/xenopsd/walkthroughs/VM.start.md +++ b/doc/content/xenopsd/walkthroughs/VM.start.md @@ -1,5 +1,8 @@ --- title: 'Walkthrough: Starting a VM' +linktitle: 'Starting a VM' +description: Complete walkthrough of starting a VM, from receiving the request to unpause. +weight: 10 --- A Xenopsd client wishes to start a VM. They must first tell Xenopsd the VM @@ -89,7 +92,7 @@ exist for: From here we shall assume the use of the "Xen via libxc, libxenguest and xenstore" (a.k.a. "Xenopsd classic") backend. -The backend [VM.add](https://github.com/xapi-project/xenopsd/blob/2a476c132c0b5732f9b224316b851a1b4d57520b/xc/xenops_server_xen.ml#L719) +The backend [VM.add](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L1603-L1659) function checks whether the VM we have to manage already exists -- and if it does then it ensures the Xenstore configuration is intact. This Xenstore configuration is important because at any time a client can query the state of a VM with @@ -196,24 +199,43 @@ takes care of: Once a thread from the worker pool becomes free, it will execute the "do it now" function. In the example above this is `perform op t` where `op` is `VM_start vm` and `t` is the Task. The function -[perform](https://github.com/xapi-project/xenopsd/blob/524d57b3c70/lib/xenops_server.ml#L1198) +[perform_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2533) has fragments like this: ```ocaml - | VM_start id -> - debug "VM.start %s" id; - perform_atomics (atomics_of_operation op) t; - VM_DB.signal id + | VM_start (id, force) -> ( + debug "VM.start %s (force=%b)" id force ; + let power = (B.VM.get_state (VM_DB.read_exn id)).Vm.power_state in + match power with + | Running -> + info "VM %s is already running" id + | _ -> + perform_atomics (atomics_of_operation op) t ; + VM_DB.signal id "^^^^^^^^^^^^^^^^^^^^-------- + ) ``` Each "operation" (e.g. `VM_start vm`) is decomposed into "micro-ops" by the function -[atomics_of_operation](https://github.com/xapi-project/xenopsd/blob/524d57b3c70/lib/xenops_server.ml#L739) +[atomics_of_operation](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L1583) where the micro-ops are small building-block actions common to the higher-level operations. Each operation corresponds to a list of "micro-ops", where there is no if/then/else. Some of the "micro-ops" may be a no-op depending on the VM configuration (for example a PV domain may not need a qemu). In the case of -`VM_start vm` this decomposes into the sequence: +[`VM_start vm`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L1584) +the `Xenopsd` server starts by calling the [functions that +decompose](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L1612-L1714) + the `VM_hook_script`, `VM_create` and `VM_build` micro-ops: +```ml + dequarantine_ops vgpus + ; [ + VM_hook_script + (id, Xenops_hooks.VM_pre_start, Xenops_hooks.reason__none) + ; VM_create (id, None, None, no_sharept) + ; VM_build (id, force) + ] +``` +This is the complete sequence of micro-ops: ## 1. run the "VM_pre_start" scripts @@ -259,105 +281,10 @@ function must ## 3. build the domain -On Xen, `Xenctrl.domain_create` creates an empty domain and -returns the domain ID (`domid`) of the new domain to `xenopsd`. - -In the `build` phase, the `xenguest` program is called to create -the system memory layout of the domain, set vCPU affinity and a -lot more. - -The function -[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024) -must - -1. run pygrub (or eliloader) to extract the kernel and initrd, if necessary -2. invoke the *xenguest* binary to interact with libxenguest. -3. apply the `cpuid` configuration -4. store the current domain configuration on disk -- it's important to know - the difference between the configuration you started with and the configuration - you would use after a reboot because some properties (such as maximum memory - and vCPUs) as fixed on create. - -### 3.1 Interface to xenguest for building domains - -[xenguest](https://github.com/xenserver/xen.pg/blob/XS-8/patches/xenguest.patch) -was originally created as a separate program due to issues with -`libxenguest` that were fixed, but we still shell out to `xenguest`: - -- Wasn't threadsafe: fixed, but it still uses a per-call global struct -- Incompatible licence, but now licensed under the LGPL. - -The `xenguest` binary has evolved to build more of the initial -domain state. `xenopsd` passes it: - -- The domain type to build for (HVM, PHV or PV), -- The `domid` of the created empty domain, -- The amount of system memory of the domain, -- The platform data (vCPUs, vCPU affinity, etc) using the Xenstore: - - the vCPU affinity - - the vCPU credit2 weight/cap parameters - - whether the NX bit is exposed - - whether the viridian CPUID leaf is exposed - - whether the system has PAE or not - - whether the system has ACPI or not - - whether the system has nested HVM or not - - whether the system has an HPET or not - -When called to build a domain, `xenguest` reads those and builds the VM accordingly. - -### 3.2 Workflow for allocating and populating domain memory - -Based on the given type, the `xenguest` program calls dedicated -functions for the build process of given domain type. - -- For HVM, this function is `stub_xc_hvm_build()`. - -These domain build functions call these functions: - -1. `get_flags()` to get the platform data from the Xenstore -2. `configure_vcpus()` which uses the platform data from the Xenstore to configure vCPU affinity and the credit scheduler parameters vCPU weight and vCPU cap (max % pCPU time for throttling) -3. For HVM, `hvm_build_setup_mem` to: - 1. Decide the `e820` memory layout of the system memory of the domain - including memory holes depending on PCI passthrough and vGPU flags. - 2. Load the BIOS/UEFI firmware images - 3. Store the final MMIO hole parameters in the Xenstore - 4. Call the `libxenguest` function - [xc_dom_boot_mem_init()](https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/tools/libs/guest/xg_dom_boot.c#L110-L126) - to allocate and map the domain's system memory. - For HVM domains, it calls - [meminit_hvm()](https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/tools/libs/guest/xg_dom_x86.c#L1348-L1648) - to loop over the `vmemranges` of the domain for mapping the system RAM - of the guest from the Xen hypervisor heap. Its goals are: - - - Attempt to allocate 1GB superpages when possible - - Fall back to 2MB pages when 1GB allocation failed - - Fall back to 4k pages when both failed - - It uses the hypercall - [XENMEM_populate_physmap()]( - https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/xen/common/memory.c#L1408-L1477) - to perform memory allocation and to map the allocated memory - to the system RAM ranges of the domain. - The hypercall must: - - 1. convert the arguments for allocating a page to hypervisor structures - 2. set flags and calls functions according to the arguments - 3. allocate the requested page at the most suitable place - - - depending on passed flags, allocate on a specific NUMA node - - else, if the domain has node affinity, on the affine nodes - - also in the most suitable memory zone within the NUMA node - - 4. fall back to less desirable places if this fails - - - or fail for "exact" allocation requests - - 5. split superpages if pages of the requested size are not available - - 5. Call `construct_cpuid_policy()` to apply the `CPUID` `featureset` policy - - For more details on the VM build step involving xenguest and Xen side see: - https://wiki.xenproject.org/wiki/Walkthrough:_VM_build_using_xenguest +The `build` phase waits, if necessary, for the Xen memory scrubber to catch +up reclaiming memory, runs NUMA placement, sets vCPU affinity and invokes +the `xenguest` to build the system memory layout of the domain. +See the [walk-through of the VM_build μ-op](VM.build) for details. ## 4. mark each VBD as "active" diff --git a/doc/content/xenopsd/walkthroughs/_index.md b/doc/content/xenopsd/walkthroughs/_index.md index 2217b209ab6..6fe3f551f29 100644 --- a/doc/content/xenopsd/walkthroughs/_index.md +++ b/doc/content/xenopsd/walkthroughs/_index.md @@ -6,7 +6,7 @@ linkTitle = "Walk-throughs" Let's trace through interesting operations to see how the whole system works. -{{% children %}} +{{% children depth=2 description=true %}} Inspiration for other walk-throughs: diff --git a/doc/content/xenopsd/walkthroughs/live-migration.md b/doc/content/xenopsd/walkthroughs/live-migration.md index f0af797f85e..c6fa02d95fa 100644 --- a/doc/content/xenopsd/walkthroughs/live-migration.md +++ b/doc/content/xenopsd/walkthroughs/live-migration.md @@ -1,6 +1,7 @@ +++ title = "Live Migration Sequence Diagram" linkTitle = "Live Migration" +description = "Sequence diagram of the process of Live Migration." +++ {{}} diff --git a/doc/hugo.toml b/doc/hugo.toml index 9c58a1eea17..a35112db945 100644 --- a/doc/hugo.toml +++ b/doc/hugo.toml @@ -46,6 +46,9 @@ themeVariant = [ ] # auto switches between "red" and "zen-dark" depending on the browser/OS dark mode: themeVariantAuto = ["red", "zen-dark"] +# Consistency: Use the font of the Hugo Relearn theme also for Mermaid diagrams: +# securityLevel=loose is the default of Relearn, it allows HTML links in diagrams: +mermaidInitialize = '{ "fontFamily": "Roboto Flex", "securityLevel": "loose" }' alwaysopen = false collapsibleMenu = true