Skip to content

Merge master branch to go_sdk feature branch #5678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
62de904
opam: Fix metadata
psafont May 28, 2024
c6e8a23
opam: de-templatise message-switch-core
psafont May 29, 2024
842ce9f
ocaml: remove unused bindings
psafont May 29, 2024
5ab3899
dune: enforce version +3
psafont May 30, 2024
808a9c2
Add `VM.set_uefi_mode` API call
benjamreis Apr 17, 2024
f6aad9b
Add `VM.get_secureboot_readiness` API call
benjamreis Apr 18, 2024
a7b9220
Add `Pool.get_guest_secureboot_readiness` API call
benjamreis Apr 22, 2024
7a14d45
Merge pull request #5566 from xcp-ng/sb-state-api
psafont May 31, 2024
5ec4c26
CP-49446: Update SR health to include new constructors
May 31, 2024
2e7e318
doc: copy design documents from xapi-project.github.io
robhoes May 17, 2024
2b43520
doc: add info table to design docs
robhoes May 31, 2024
45b673f
doc: style design doc index
robhoes May 31, 2024
ea8bffb
Merge pull request #5656 from psafont/meta
psafont May 31, 2024
de8f720
CP-49647 use URI for create_misc
May 22, 2024
b905868
CP-49647 use URI for dbsync_master
May 22, 2024
06c10bf
CP-49647 use URI for export.ml
May 22, 2024
2b8b371
CP-49647 use URI for import.ml
May 22, 2024
0f335a6
CP-49647 use URI for importexport.ml
May 22, 2024
4dbf832
CP-49647 use URI for rrd_proxy.ml
May 23, 2024
8cc0f48
CP-49647 use URI for sm_fs_ops.ml
May 23, 2024
c38d52c
CP-49647 use URI for xapi_message.ml
May 23, 2024
52f297b
CP-49647 use URI for xapi_xenops.ml
May 23, 2024
d03a2cd
CP-49647 use URI for xapi_vm_migrate.ml
May 23, 2024
1c8efd5
CP-49647 use URI for xapi_host.ml
May 23, 2024
561ec18
CP-49647 use URI for cli_util.ml
May 24, 2024
23cab04
CP-49647 use URI for http.ml
May 23, 2024
1c4f3a9
CP-49647 use URI for cli_operations
May 22, 2024
dab475d
CP-45235: Support for `xe-cli` to transmit `traceparent`
GabrielBuica May 13, 2024
f2a78b5
doc: add design review links (historical)
robhoes Jun 4, 2024
52ffb8b
doc: RDP design: fix list nesting
robhoes Jun 4, 2024
4877b1b
Merge pull request #5664 from robhoes/design-docs
robhoes Jun 4, 2024
e5bb639
CP-48995: Instrument `XenAPI.py` to submit traceparent
GabrielBuica May 16, 2024
adba6ee
Update datamodel_lifecycle.ml
Jun 5, 2024
c39726e
CP-49249: Implement SMAPIv3 CBT Forwarding
Jun 5, 2024
f970909
CA-393866: Add support for Infinity in Java SDK parser
danilo-delbusso Jun 5, 2024
f12b9b2
CA-393507: Default cluster_stack value
Vincent-lau Jun 4, 2024
99e05fb
Merge pull request #5639 from GabrielBuica/private/dbuica/CP-48995
robhoes Jun 5, 2024
190df63
Merge pull request #5633 from GabrielBuica/private/dbuica/CP-45235
psafont Jun 6, 2024
e802010
Merge pull request #5675 from contificate/private/cbarr/CP-49249
Jun 6, 2024
ff0c97e
Merge pull request #5674 from Vincent-lau/private/shul2/cluster-stack…
minglumlu Jun 6, 2024
cb0e550
Remove fix_firewall.sh
rosslagerwall Jun 7, 2024
399595e
Merge pull request #5673 from danilo-delbusso/bug/infinity_CA-393866
kc284 Jun 10, 2024
e92064a
Merge pull request #5677 from rosslagerwall/private/rossla/firewall
robhoes Jun 10, 2024
b816e00
Merge branch 'master' into private/mingl/merge_master_to_feature
minglumlu Jun 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions doc/assets/css/misc.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.revision-table {
width: 50%;
margin: 1em auto 1em auto;
font-size: 80%;
}

.label {
display: inline;
padding: .2em .6em .3em;
font-size: 75%;
font-weight: 700;
line-height: 1;
color: #fff;
text-align: center;
white-space: nowrap;
vertical-align: baseline;
border-radius: .25em;
}

.label.label-default {
background-color: #777;
}

.label.label-info {
background-color: #5bc0de;
}

.label.label-danger {
background-color: #d9534f;
}

.label.label-warning {
background-color: #f0ad4e;
}

.label.label-success {
background-color: #5cb85c;
}

.table-condensed > thead > tr > th,
.table-condensed > tbody > tr > th,
.table-condensed > tfoot > tr > th,
.table-condensed > thead > tr > td,
.table-condensed > tbody > tr > td,
.table-condensed > tfoot > tr > td {
padding: 5px;

}

.table-striped > tbody > tr:nth-child(odd) {
background-color: #f9f9f9;
}

.btn {
display: inline-block;
padding: 6px 12px;
margin-bottom: 0;
font-weight: normal;
line-height: 1.42857143;
text-align: center;
white-space: nowrap;
vertical-align: middle;
cursor: pointer;
background-image: none;
border: 1px solid transparent;
border-radius: 4px;
}

.btn-link {
font-weight: normal;
color: #337ab7;
border-radius: 0;
}
98 changes: 98 additions & 0 deletions doc/content/design/RDP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: RDP control
layout: default
design_doc: true
revision: 2
status: released (XenServer 6.5 SP1)
design_review: 12
---
### Purpose

To administer guest VMs it can be useful to connect to them over Remote Desktop Protocol (RDP). XenCenter supports this; it has an integrated RDP client.

First it is necessary to turn on the RDP service in the guest.

This can be controlled from XenCenter. Several layers are involved. This description starts in the guest and works up the stack to XenCenter.

This feature was completed in the first quarter of 2015, and released in Service Pack 1 for XenServer 6.5.

### The guest agent

The XenServer guest agent installed in Windows VMs can turn the RDP service on and off, and can report whether it is running.

The guest agent is at https://github.com/xenserver/win-xenguestagent

Interaction with the agent is done through some Xenstore keys:

The guest agent running in domain N writes two xenstore nodes when it starts up:
* `/local/domain/N/control/feature-ts = 1`
* `/local/domain/N/control/feature-ts2 = 1`

This indicates support for the rest of the functionality described below.

(The "...ts2" flag is new for this feature; older versions of the guest agent wrote the "...ts" flag and had support for only a subset of the functionality (no firewall modification), and had a bug in updating `.../data/ts`.)

To indicate whether RDP is running, the guest agent writes the string "1" (running) or "0" (disabled) to xenstore node

`/local/domain/N/data/ts`.

It does this on start-up, and also in response to the deletion of that node.

The guest agent also watches xenstore node `/local/domain/N/control/ts` and it turns RDP on and off in response to "1" or "0" (respectively) being written to that node. The agent acknowledges the request by deleting the node, and afterwards it deletes `local/domain/N/data/ts`, thus triggering itself to update that node as described above.

When the guest agent turns the RDP service on/off, it also modifies the standard Windows firewall to allow/forbid incoming connections to the RDP port. This is the same as the firewall change that happens automatically when the RDP service is turned on/off through the standard Windows GUI.

### XAPI etc.

xenopsd sets up watches on xenstore nodes including the `control` tree and `data/ts`, and prompts xapi to react by updating the relevant VM guest metrics record, which is available through a XenAPI call.

XenAPI includes a new message (function call) which can be used to ask the guest agent to turn RDP on and off.

This is `VM.call_plugin` (analogous to `Host.call_plugin`) in the hope that it can be used for other purposes in the future, even though for now it does not really call a plugin.

To use it, supply `plugin="guest-agent-operation"` and either `fn="request_rdp_on"` or `fn="request_rdp_off"`.

See http://xapi-project.github.io/xen-api/classes/vm.html

The function strings are named with "request" (rather than, say, "enable_rdp" or "turn_rdp_on") to make it clear that xapi only makes a request of the guest: when one of these calls returns successfully this means only that the appropriate string (1 or 0) was written to the `control/ts` node and it is up to the guest whether it responds.

### XenCenter

#### Behaviour on older XenServer versions that do not support RDP control

Note that the current behaviour depends on some global options: "Enable Remote Desktop console scanning" and "Automatically switch to the Remote Desktop console when it becomes available".

1. When tools are not installed:
* As of XenCenter 6.5, the RDP button is absent.
2. When tools are installed but RDP is not switched on in the guest:
1. If "Enable Remote Desktop console scanning" is on:
* The RDP button is present but greyed out. (It seems to sometimes read "Switch to Remote Desktop" and sometimes read "Looking for guest console...": I haven't yet worked out the difference).
* We scan the RDP port to detect when RDP is turned on
2. If "Enable Remote Desktop console scanning" is off:
* The RDP button is enabled and reads "Switch to Remote Desktop"
3. When tools are installed and RDP is switched on in the guest:
1. If "Enable Remote Desktop console scanning" is on:
* The RDP button is enabled and reads "Switch to Remote Desktop"
* If "Automatically switch" is on, we switch to RDP immediately we detect it
2. If "Enable Remote Desktop console scanning" is off:
* As above, the RDP button is enabled and reads "Switch to Remote Desktop"

#### New behaviour on XenServer versions that support RDP control

1. This new XenCenter behaviour is only for XenServer versions that support RDP control, with guests with the new guest agent: behaviour must be unchanged if the server or guest-agent is older.
2. There should be no change in the behaviour for Linux guests, either PV or HVM varieties: this must be tested.
3. We should never scan the RDP port; instead we should watch for a change in the relevant variable in guest_metrics.
4. The XenCenter option "Enable Remote Desktop console scanning" should change to read "Enable Remote Desktop console scanning (XenServer 6.5 and earlier)"
5. The XenCenter option "Automatically switch to the Remote Desktop console when it becomes available" should be enabled even when "Enable Remote Desktop console scanning" is off.
6. When tools are not installed:
* As above, the RDP button should be absent.
7. When tools are installed but RDP is not switched on in the guest:
* The RDP button should be enabled and read "Turn on Remote Desktop"
* If pressed, it should launch a dialog with the following wording: "Would you like to turn on Remote Desktop in this VM, and then connect to it over Remote Desktop? [Yes] [No]"
* That button should turn on RDP, wait for RDP to become enabled, and switch to an RDP connection. It should do this even if "Automatically switch" is off.
8. When tools are installed and RDP is switched on in the guest:
* The RDP button should be enabled and read "Switch to Remote Desktop"
* If "Automatically switch" is on, we should switch to RDP immediately
* There is no need for us to provide UI to switch RDP off again
9. We should also test the case where RDP has been switched on in the guest before the tools are installed.

6 changes: 6 additions & 0 deletions doc/content/design/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
+++
title = "Design Documents"
menuTitle = "Designs"
+++

{{< design_docs_list >}}
67 changes: 67 additions & 0 deletions doc/content/design/aggr-storage-reboots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: Aggregated Local Storage and Host Reboots
layout: default
design_doc: true
revision: 3
status: proposed
design_review: 144
revision_history:
- revision_number: 1
description: Initial version
- revision_number: 2
description: Included some open questions under Xapi point 2
- revision_number: 3
description: Added new error, task, and assumptions
---

## Introduction

When hosts use an aggregated local storage SR, then disks are going to be mirrored to several different hosts in the pool (RAID). This ensures that if a host goes down (e.g. due to a reboot after installing a hotfix or upgrade, or when "fenced" by the HA feature), all disk contents in the SR are still accessible. This also means that if all disks are mirrored to just two hosts (worst-case scenario), just one host may be down at any point in time to keep the SR fully available.

When a node comes back up after a reboot, it will resynchronise all its disks with the related mirrors on the other hosts in the pool. This syncing takes some time, and only after this is done, we may consider the host "up" again, and allow another host to be shut down.

Therefore, when installing a hotfix to a pool that uses aggregated local storage, or doing a rolling pool upgrade, we need to make sure that we do hosts one-by-one, and we wait for the storage syncing to finish before doing the next.

This design aims to provide guidance and protection around this by blocking hosts to be shut down or rebooted from the XenAPI except when safe, and setting the `host.allowed_operations` field accordingly.


## XenAPI

If an aggregated local storage SR is in use, and one of the hosts is rebooting or down (for whatever reason), or resynchronising its storage, the operations `reboot` and `shutdown` will be removed from the `host.allowed_operations` field of _all_ hosts in the pool that have a PBD for the SR.

This is a conservative approach in that assumes that this kind of SR tolerates only one node "failure", and assumes no knowledge about how the SR distributes its mirrors. We may refine this in future, in order to allow some hosts to be down simultaneously.

The presence of the `reboot` operation in `host.allowed_operations` indicates whether the `host.reboot` XenAPI call is allowed or not (similarly for `shutdown` and `host.shutdown`). It will not, of course, prevent anyone from rebooting a host from the dom0 console or power switch.

Clients, such as XenCenter, can use `host.allowed_operations`, when applying an update to a pool, to guide them when it is safe to update and reboot the next host in the sequence.

In case `host.reboot` or `host.shutdown` is called while the storage is busy resyncing mirrors, the call will fail with a new error `MIRROR_REBUILD_IN_PROGRESS`.

## Xapi

Xapi needs to be able to:

1. Determine whether aggregated local storage is in use; this just means that a PBD for such an SR present.
* TBD: To avoid SR-specific code in xapi, the storage backend should tell us whether it is an aggregated local storage SR.
2. Determine whether the storage system is resynchronising its mirrors; it will need to be able to query the storage backend for this kind of information.
* Xapi will poll for this and will reflect that a resync is happening by creating a `Task` for it (in the DB). This task can be used to track progress, if available.
* The exact way to get the syncing information from the storage backend is SR specific. The check may be implemented in a separate script or binary that xapi calls from the polling thread. Ideally this would be integrated with the storage backend.
3. Update `host.allowed_operations` for all hosts in the pool according to the rules described above. This comes down to updating the function `valid_operations` in `xapi_host_helpers.ml`, and will need to use a combination of the functionality from the two points above, plus and indication of host liveness from `host_metrics.live`.
4. Trigger an update of the allowed operations when a host shuts down or reboots (due to a XenAPI call or otherwise), and when it has finished resynchronising when back up. Triggers must be in the following places (some may already be present, but are listed for completeness, and to confirm this):
* Wherever `host_metrics.live` is updated to detect pool slaves going up and down (probably at least in `Db_gc.check_host_liveness` and `Xapi_ha`).
* Immediately when a `host.reboot` or `host.shutdown` call is executed: `Message_forwarding.Host.{reboot,shutdown,with_host_operation}`.
* When a storage resync is starting or finishing.

All of the above runs on the pool master (= SR master) only.

## Assumptions

The above will be safe if the storage cluster is equal to the XenServer pool. In general, however, it may be desirable to have a storage cluster that is larger than the pool, have multiple XS pools on a single cluster, or even share the cluster with other kinds of nodes.

To ensure that the storage is "safe" in these scenarios, xapi needs to be able to ask the storage backend:

1. if a mirror is being rebuilt "somewhere" in the cluster, AND
2. if "some node" in the cluster is offline (even if the node is not in the XS pool).

If the cluster is equal to the pool, then xapi can do point 2 without asking the storage backend, which will simplify things. For the moment, we assume that the storage cluster is equal to the XS pool, to avoid making things too complicated (while still need to keep in mind that we may change this in future).

95 changes: 95 additions & 0 deletions doc/content/design/archival-redesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: RRDD archival redesign
layout: default
design_doc: true
revision: 1
status: released (7,0)
---

## Introduction

Current problems with rrdd:

* rrdd stores knowledge about whether it is running on a master or a slave

This determines the host to which rrdd will archive a VM's rrd when the VM's
domain disappears - rrdd will always try to archive to the master. However,
when a host joins a pool as a slave rrdd is not restarted so this knowledge is
out of date. When a VM shuts down on the slave rrdd will archive the rrd
locally. When starting this VM again the master xapi will attempt to push any
locally-existing rrd to the host on which the VM is being started, but since
no rrd archive exists on the master the slave rrdd will end up creating a new
rrd and the previous rrd will be lost.

* rrdd handles rebooting VMs unpredictably

When rebooting a VM, there is a chance rrdd will attempt to update that VM's rrd
during the brief period when there is no domain for that VM. If this happens,
rrdd will archive the VM's rrd to the master, and then create a new rrd for the
VM when it sees the new domain. If rrdd doesn't attempt to update that VM's rrd
during this period, rrdd will continue to add data for the new domain to the old
rrd.

## Proposal

To solve these problems, we will remove some of the intelligence from rrdd and
make it into more of a slave process of xapi. This will entail removing all
knowledge from rrdd of whether it is running on a master or a slave, and also
modifying rrdd to only start monitoring a VM when it is told to, and only
archiving an rrd (to a specified address) when it is told to. This matches the
way xenopsd only manages domains which it has been told to manage.

## Design

For most VM lifecycle operations, xapi and rrdd processes (sometimes across more
than one host) cooperate to start or stop recording a VM's metrics and/or to
restore or backup the VM's archived metrics. Below we will describe, for each
relevant VM operation, how the VM's rrd is currently handled, and how we propose
it will be handled after the redesign.

#### VM.destroy

The master xapi makes a remove_rrd call to the local rrdd, which causes rrdd to
to delete the VM's archived rrd from disk. This behaviour will remain unchanged.

#### VM.start(\_on) and VM.resume(\_on)

The master xapi makes a push_rrd call to the local rrdd, which causes rrdd to
send any locally-archived rrd for the VM in question to the rrdd of the host on
which the VM is starting. This behaviour will remain unchanged.

#### VM.shutdown and VM.suspend

Every update cycle rrdd compares its list of registered VMs to the list of
domains actually running on the host. Any registered VMs which do not have a
corresponding domain have their rrds archived to the rrdd running on the host
believed to be the master. We will change this behaviour by stopping rrdd from
doing the archiving itself; instead we will expose a new function in rrdd's
interface:

```
val archive_rrd : vm_uuid:string -> remote_address:string -> unit
```

This will cause rrdd to remove the specified rrd from its table of registered
VMs, and archive the rrd to the specified host. When a VM has finished shutting
down or suspending, the xapi process on the host on which the VM was running
will call archive_rrd to ask the local rrdd to archive back to the master rrdd.

#### VM.reboot

Removing rrdd's ability to automatically archive the rrds for disappeared
domains will have the bonus effect of fixing how the rrds of rebooting VMs are
handled, as we don't want the rrds of rebooting VMs to be archived at all.

#### VM.checkpoint

This will be handled automatically, as internally VM.checkpoint carries out a
VM.suspend followed by a VM.resume.

#### VM.pool_migrate and VM.migrate_send

The source host's xapi makes a migrate_rrd call to the local rrd, with a
destination address and an optional session ID. The session ID is only required
for cross-pool migration. The local rrdd sends the rrd for that VM to the
destination host's rrdd as an HTTP PUT. This behaviour will remain unchanged.
Loading
Loading