Skip to content

Merge master into feature/host-network-device-ordering #6520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 84 commits into from

Conversation

changlei-li
Copy link
Contributor

No description provided.

LunfanZhang and others added 30 commits March 27, 2025 01:33
…onfigure

Add new host object fields:
  - ssh_enabled
  - ssh_enabled_timeout
  - ssh_expiry
  - console_idle_timeout
Add new host/pool API to enable to set a temporary enabled SSH service timeout
  - set_ssh_enabled_timeout
Add new host/pool API to enable to set console timeout
  - set_console_idle_timeout

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
This PR introduces support for Dom0 SSH control, providing the following
capabilities:

Query the SSH status.
Configure a temporary SSH enable timeout for a specific host or all
hosts in the pool.
Configure the console idle timeout for a specific host or all hosts in
the pool.

Changes
New Host Object Fields:

- `ssh_enabled`: Indicates whether SSH is enabled.
- `ssh_enabled_timeout`: Specifies the timeout for temporary SSH
enablement.
- `ssh_expiry`: Tracks the expiration time for temporary SSH enablement.
- `console_idle_timeout`: Configures the idle timeout for the console.

New Host/Pool APIs (This PR only include the change of data model, the
implementation of this API will be include in the next PR):

- `set_ssh_enabled_timeout`: Allows setting a temporary timeout for
enabling the SSH service.
- `set_console_idle_timeout`: Allows configuring the console idle
timeout.
Add "Changed" records for 2 APIs which were missed.

Fix "param_release" for 3 added parameters.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
During pool join, create a new host obj in the remote pool coordinator
DB with the same SSH settings as pool coordinator.

Also configure SSH service locally before xapi restart which will
persist after xapi restart.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
Useless here, the local DB will be dropped soon as the joinner will
switch to the remote DB of the new coordinator.

And latest_synced_updates_applied will be set to `unknown in host.create
in remote DB as default value.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
After being ejected from a pool, a new host obj will be created with
default settings in DB.

This commit configures SSH service in the ejected host to default state
during pool eject.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Implemented XAPI APIs:
  - `host.set_console_idle_timeout`
  - `pool.set_console_idle_timeout`

These APIs allow XAPI to configure timeout for idle console sessions.

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
Implemented XAPI APIs:
  - `host.set_ssh_enabled_timeout`
  - `pool.set_ssh_enabled_timeout`
These APIs allow XAPI to configure timeout for SSH service.
`host.enable_ssh` now also supports enabling the SSH service with a ssh_enabled_timeout

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
Updated `records.ml` file to support `host-param-set/get/list` and `pool-param-set/get/list` for SSH-related fields.

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
Implemented XAPI APIs:
  - `set_ssh_enabled_timeout`
  - `set_console_idle_timeout`

These APIs allow XAPI to configure timeouts for the SSH service and idle
console sessions from both host and pool level.

Updated `records.ml` to support `host-param-set/get/list` and
`pool-param-set/get/list` for SSH-related fields.
The error set_console_idle_timeout_failed was added in feature branch
while it is not used anywhere. The error used in
set_console_idle_timeout now is invalid_value.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Zeroday BYTE <pwnosecauth@gmail.com>
 - Ensure host.enabled_ssh reflects the actual SSH service state on startup, in case it was manually changed by the user.

 - Reschedule the "disable SSH" job if:
   - host.ssh_enabled_timeout is set to a positive value, and
   - host.ssh_expiry is in the future.

 - Disable the SSH if:
   - host.ssh_enabled_timeout is set to a positive value, and
   - host.ssh_expiry is in the past.

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
Viewing RRDs produced by xcp-rrdd is difficult, because the format is incompatible with rrdtool.
rrdtool has a hardcoded limit of 20 char for RRD names for backward compat with its binary format.

Steps:
* given a directory of xml .gz files containing xcp-rrdd produced rrds
* invokes itself recursively with each file in turn using xargs -P (easy way to parallelize on OCaml 4)
* load all RRDs, and split them into separate files, allowing us to shorten many of their names without conflicts
* some names are still too long, there is a builtin translation table to shorten these
* once split an .rrd file is created using 'rrdtool restore'. This can further be queried/inspected/transformed by rrdtool as needed
* a .sh script is produced that can plot the RRD if desired.

There are many RRDs so plotting isn't done automatically yet.

RRDs contain min/avg/max usually, so this is drawn as a strong line at the average, and an area in a lighter color for min/max
(especially useful for historic data that has been aggregated).

Caveats:
* we don't know the unit name, that is part of the XAPI metadata, but not the XML apparently?
* separate plots are generated for separate intervals, it'd be nice to join all these into the same graph
* the visualization type is not the best for all RRDs, some might benefit from a smoother line, etc.
* for now the tool is just built, but not installed (that'll require a .spec change too and can be done later)

This is just a starting point to be able to visualize this data somehow, and we can improve the actual plotting later.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Add 'tyre' dependency.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
 - Refine the exception when host.enable_ssh/host.disable_ssh failed
 - Reset the host.ssh_expiry to default when host.enabl_ssh with no
   timeout

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
When we have multiple SM plugins in XAPI for the same type (which
happens only because of past problems) and want to remove the obsolete
one, do this iby  reference. The code so far was assuming only one per
type and looked up the reference by name which was not unique and hence
could end up removing the wrong SM entry.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
When we have multiple SM plugins in XAPI for the same type (which
happens only because of past problems) and want to remove the obsolete
one, do this iby reference. The code so far was assuming only one per
type and looked up the reference by name which was not unique and hence
could end up removing the wrong SM entry.
When adding a feature, developers had to change the variant, and the list
all_features. Now the list is autogenerated from the variant, and the
compiler will complain if its properties are not defined. Also reduced
complexity of the code in the rest of the module.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
I tried sharing more code between hard and soft affinities, but the memory
management of the two cpumaps blows up the number of branches that need to be
taken care of, making it more worthwhile to duplicate a bit of code instead.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
No functional change. This prepares pre_build in the domain module to be able
to set the hard affinity mask, without communicating the mask to xenguest
through xenstore.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
BengangY and others added 27 commits June 5, 2025 03:35
To handle deviations in CPU rates, Derive values exceeding the maximum
by up to 5% are capped at the maximum; others are marked as unknown.
This logic is specific to Derive data sources because they represent
rates derived from differences over time, which can occasionally exceed
expected bounds due to measurement inaccuracies.
In XSI-1915, MCS shutdowned a VM and tried to destroy VBD right after
MCS received the event which came from power_state's change and failed.
The failure reason is below:
1. The update for VM's power_state and the update for VBDs is not a
transaction, so the client may receive the event from the update for
power_state and operate VBDs before the update for VBDs.
2. The VM's running on supporter. The DB operation needs to send RPC to
the coordinator. This needs time.
3. Between the update for VM's power_state and the update for VBD, xapi
also updates the field pending_guildencs which needs at least 8 DB
operation. This also delays the update for VBDs.

It's not straightforward to add transactions for these DB operations.
The workaround is to move the update for pending_guildencs to the end of
the DB operation of VBDs, VIFs, GPUs, etc. So that VBD will be updated
after the update for VM's power_state immediately.
Use cram tests to expect the desired output of the command instead

This reduces the amount of text displayed when running tests, which
makes locating the errors in the logs easier

When the output of the tools changes deliberately, the expect files can
be changed with `dune runtest --auto-promote`
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Some rough guidelines on the contribution process for the project.
Intended more as a starting point for a discussion.
I wrote this in ~2022, so I don't fully remember how it all works, but I
tried to document what I know in the commit message, in the CLI flag
docs and here:
```
scp -r root@$YOURBOX:/var/lib/xcp/blobs/rrds /tmp/rrds
dune exec ./rrdview.exe -- /tmp/rrds
bash /tmp/rrds/16db833b-7cd6-4b69-9037-144076c71033.cpu_avg.DERIVE.sh
```

![rrd](https://github.com/user-attachments/assets/b3c98936-cc29-4871-b4ea-1511853d623e)

Viewing RRDs produced by xcp-rrdd is difficult, because the format is
incompatible with rrdtool. rrdtool has a hardcoded limit of 20 char for
RRD names for backward compat with its binary format.

Steps:
* given a directory of xml .gz files containing xcp-rrdd produced rrds
* invokes itself recursively with each file in turn using xargs -P (easy
way to parallelize on OCaml 4)
* load all RRDs, and split them into separate files, allowing us to
shorten many of their names without conflicts
* some names are still too long, there is a builtin translation table to
shorten these
* once split an .rrd file is created using 'rrdtool restore'. This can
further be queried/inspected/transformed by rrdtool as needed
* a .sh script is produced that can plot the RRD if desired.

There are many RRDs so plotting isn't done automatically yet.

RRDs contain min/avg/max usually, so this is drawn as a strong line at
the average, and an area in a lighter color for min/max (especially
useful for historic data that has been aggregated).

Caveats:
* we don't know the unit name, that is part of the XAPI metadata, but
not the XML apparently?
* separate plots are generated for separate intervals, it'd be nice to
join all these into the same graph
* the visualization type is not the best for all RRDs, some might
benefit from a smoother line, etc.
* for now the tool is just built, but not installed (that'll require a
.spec change too and can be done later)
* there is some code there to start parsing the data source definitions,
eventually I wanted to plot the data using OCaml instead of rrdtool
(e.g. generate Vega/Vega-lite graphs), but I can't find which branch I
put *that* code on, what I have here is incomplete (or maybe I never
wrote that part, just thought about it). We could trim the dead code
from here if needed, but it might be useful if we continue improving the
tool later, so for now I left the parsing in.

This is just a starting point to be able to visualize this data somehow,
and we can improve the actual plotting later.
The consolidator used to be aware of which domains were paused, this was used
to avoid reporting memory changes for paused domains, exclusively. Move that
responsibility to the domain memory reporter instead, this makes the decision
local, simplifying code.

This is useful to separate the memory code from the rest of rrdd.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Update all `25.20.0-next` to `25.21.0` in `datamodel_lifecycle.ml`.

Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
The /sys/fs/cgroup/systemd/cgroup.procs file is not always present,
particularly in updated Linux systems with newer cgroup and SystemD.
So fallback to root /sys/fs/cgroup/cgroup.procs.
Also handle and report errors back to Ocaml.

Although SystemD discourage handling cgroups without service
configuration changes the root cgroup is a bit special as receiving
processes from multiple sources, including the kernel.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Update all `25.20.0-next` to `25.21.0` in `datamodel_lifecycle.ml`.
Unfortunately mirage-crypto has accumulated breaking changes:
- Cstructs have been replaced with strings
- The digestif library has replaced ad-hoc hash implementation

A deprecation has happened as well:
- RNG initialization has changed

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This API call and corresponding XE implementation calls a host plugin
on the host where a VM is running. It thus takes care of finding the
right host, compared to Host.call_plugin where this would be left to the
user.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Add a new function that will invoke a callback every time one of the tasks is
deemed non-pending. This will allow its users to:

1) track the progress of tasks within the submitted batch
2) schedule new tasks to replace the completed ones

Modify wait_for_all_inner so that it adds the tasks returned from the callback
to its internal set on every new task completion.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
With bab83d9, host evacuation was parallelized
by grouping VMs into batches, and starting a new batch once the previous one
has finished. This means that a single slow VM can potentially slow down the
whole evacuation.

Instead use Tasks.wait_for_all_with_callback to schedule a new migration as
soon as any of the previous ones have finished, thus maintaining a constant
flow of n migrations.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This API call and corresponding XE implementation calls a host plugin on
the host where a VM is running. It thus takes care of finding the right
host, compared to Host.call_plugin where this would be left to the user.
at least for a while longer...

Mirrors the changes in the 1.249 LCM branch: #6473

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Currently rrdd needs to know when a metric comes from a newly created domain,
(after a local migration, for example). This is because when a new domain is
created the counters start from zero again. This needs special logic for
aggregating metrics since xcp-rrdd needs to provide continuity of metrics of a
VM with a UUID, even if the domid changes.

Previously rrdd fetched the data about domains before metrics from plugins
were collected, and reused the data for self-reported metrics. While this meant
that for self-reported metrics it was impossible to miss collected information,
for plugin metrics it meant that for created and destroyed domains, the
between between domain id and VM UUID was not available.

With the current change the domain ids and VM UUIDs are collected every
iteration of the monitor loop, and kept for one more iteration, so domains
destroyed in the last iteration are remembered and not missed.

With this done it's now safe to move the host and memory metrics collection
into its own plugin.

Also use sequences more thoroughly in the code for transformations

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
The only use of it was a parameter that was not used anywhere

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
at least for a while longer...

Mirrors the changes in the 1.249 LCM branch: #6473
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Currently rrdd needs to know when a metric comes from a new domain,
(after a
local migration, for example). This is because when a new domain is
created the
counters start from zero again, and so this needs special logic to
handle when
aggregating the metrics into rrds.

Previously rrdd collected this information before metrics were
collected, this means that metrics collected by plugins could be be lost
if the
domain was created in that small amount of time, or if the domain was
destroyed
after a plugin collected data about it.

With the current change the domains are collected every loop and added
to the
domains collected in the previous loop to avoid missing any newly
created or
destroyed domains. The current iteration only gets fed data from the
last
iteration to avoid accumulating all domains seen since the start of
xcp-rrdd.

With this done it's now safe to move the host and memory metrics
collection
into its own plugin.

Also use sequences more throroughly in the code for transformations

I've manually tested this change by repeatedly by single-host
live-migrating a VM and checking that no beats are missed on the graphs.
![Screenshot 2025-06-09 at 15 55
54](https://github.com/user-attachments/assets/8d5dea0a-a1aa-4a49-a712-9512e18036cc)
With bab83d9, host evacuation was
parallelized by grouping VMs
into batches, and starting a new batch once the previous one has
finished.
This means that a single slow VM can potentially slow down the whole
evacuation.

Add a new `Tasks.wait_for_all_with_callback` function that will
invoke a callback every time one of the tasks is
deemed non-pending. This will allow its users to:

1) track the progress of tasks within the submitted batch
2) schedule new tasks to replace the completed ones

Use the new `Tasks.wait_for_all_with_callback` in `xapi_host` to
schedule a new migration as soon as any of the previous ones have
finished, thus maintaining a constant flow of `n` migrations.

Additionally expose the `evacuate-batch-size` parameter in the CLI, this
was missed when it was originally added with the CLI setting it to `0`
(pick the default) all the time.

===

Manually tested multiple times, confirmed to not break anything and to
actually maintain a constant flow of migrations. This should greatly
speed up host evacuations when there is a combination of bigger and
smaller VMs (in terms of memory/disk, or VMs with some other reason for
slow migration) on the host
Unfortunately mirage-crypto has accumulated breaking changes:
- Cstructs have been replaced with strings
- The digestif library has replaced ad-hoc hash implementation

A deprecation has happened as well:
- RNG initialization has changed

Because there are breaking changes, xs-opam changes need to be
introduced at the same time:
xapi-project/xs-opam#731

Only xapi is affected by the breaking builds, so no other toolstack
repositories have incoming PRs.
I've tested builds with  Smoke and validation tests: SR 218740

This means that the merge will be done as such:
- Both PRs are approved
- First I will merge xs-opam will be emrged (with failing CI)
- Then this PR will be merged with the merge train that runs tests
before actually merging.
- CI is rerun manually on xs-opam to make it green again

After merging both xenserver's CI should create a successful build with
both PR included
The /sys/fs/cgroup/systemd/cgroup.procs file is not always present,
particularly in updated Linux systems with newer cgroup and SystemD. So
fallback to root /sys/fs/cgroup/cgroup.procs.
Also handle and report errors back to Ocaml.

Although SystemD discourage handling cgroups without service
configuration changes the root cgroup is a bit special as receiving
processes from multiple sources, including the kernel.
Also fix the Makefile, so that 'make clean' also deletes the `.o.d` files.

This avoids accidentally adding these files to git
(although normally dune would invoke make in _build,
 only if you manually invoke it would it create these extra files):
```
A ocaml/forkexecd/helper/close_from.o
A ocaml/forkexecd/helper/close_from.o.d
A ocaml/forkexecd/helper/syslog.o
A ocaml/forkexecd/helper/syslog.o.d
A ocaml/forkexecd/helper/vfork_helper
A ocaml/forkexecd/helper/vfork_helper.o
A ocaml/forkexecd/helper/vfork_helper.o.d
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Also fix the Makefile, so that 'make clean' also deletes the `.o.d`
files.

This avoids accidentally adding these files to git (although normally
dune would invoke make in _build,
 only if you manually invoke it would it create these extra files):
```
A ocaml/forkexecd/helper/close_from.o
A ocaml/forkexecd/helper/close_from.o.d
A ocaml/forkexecd/helper/syslog.o
A ocaml/forkexecd/helper/syslog.o.d
A ocaml/forkexecd/helper/vfork_helper
A ocaml/forkexecd/helper/vfork_helper.o
A ocaml/forkexecd/helper/vfork_helper.o.d
```
@changlei-li
Copy link
Contributor Author

I need to fix the CI error. It's strange the error didn't emerge before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.