Skip to content

Sync feature/py3 branch with master #5894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 72 commits into from
Jul 31, 2024

Conversation

stephenchengCloud
Copy link
Contributor

$ git show f0272f38447ecdb1b6ab3092770aa74570fa8a56
commit f0272f38447ecdb1b6ab3092770aa74570fa8a56 (HEAD -> sync_with_master)
Merge: 7a78d315f bc7e730c7
Author: Stephen Cheng <stephen.cheng@cloud.com>
Date:   Wed Jul 31 07:19:52 2024 +0100

    Merge branch 'master' into sync_with_master
    
    Signed-off-by: Stephen Cheng <stephen.cheng@cloud.com>

diff --cc python3/Makefile
index 514d21e5c,424dce190..13bc58546
--- a/python3/Makefile
+++ b/python3/Makefile
@@@ -22,40 -11,3 +22,39 @@@ install
        $(IDATA) packages/observer.py $(DESTDIR)$(SITE3_DIR)/
        $(IDATA) dnf_plugins/accesstoken.py $(DESTDIR)$(SITE3_DIR)/$(DNF_PLUGIN_DIR)/
        $(IDATA) dnf_plugins/ptoken.py $(DESTDIR)$(SITE3_DIR)/$(DNF_PLUGIN_DIR)/
 +
 +      $(IPROG) libexec/host-display $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/link-vms-by-sr.py $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/usb_reset.py $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/usb_scan.py $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/nbd_client_manager.py $(DESTDIR)$(LIBEXECDIR)
-       $(IPROG) libexec/probe-device-for-file $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/print-custom-templates $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/mail-alarm $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/backup-sr-metadata.py $(DESTDIR)$(LIBEXECDIR)
 +      $(IPROG) libexec/restore-sr-metadata.py $(DESTDIR)$(LIBEXECDIR)
 +
 +      $(IPROG) bin/hfx_filename $(DESTDIR)$(OPTDIR)/bin
 +      $(IPROG) bin/xe-reset-networking $(DESTDIR)$(OPTDIR)/bin
 +      $(IPROG) bin/perfmon $(DESTDIR)$(OPTDIR)/bin

minglumlu and others added 30 commits July 9, 2024 12:06
Prior to this commit, the xapi on the coordinator host records the
'Unix.gettimeofday' as the timestamps of the heartbeat with other pool
supporter hosts. When the system clock is updated with a huge jump
forward, the timestamps would be far back into the past. This would
cause the xapi assumes that the hosts are offline as long time no
heartbeats.

In this commit, the timestamps are changed to get from a monotonic
clock. In this way, the system clock changes will not impact the
heartbeats' timestamps any more.

Additionally, Host_metrics.last_updated is only set when the object is
created. It's useless in check_host_liveness at all.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
Xapi will send alert when the user is running on a Corosync 2 cluster
and prompting them to upgrade. This should only happen on XS 9 and is
behind the corosync3 feature flag.

XenCenter can take advantage of this alert message, but should have its
own warning message to tell the user why/how to perform the upgrade.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
By default message-switch calls are serialized for backwards compatibility reasons in the Lwt and Async modes.
(We tried enabling parallel actions by default but got some hard to debug failures in the CI).

This causes very long VM start times when multiple VBDs are plugged/unplugged concurrently: the operations are seen concurrently by message-switch,
but xapi-storage-script only retrieves and dispatches them sequentially, so any opportunity for parallel execution is lost.
Even though the actions themselves only take seconds, due to serialization, a VM start may take minutes.

Enable parallel processing explicitly here (instead of for all message-switch clients).
SMAPIv3 should expect to be called concurrently (on different hosts even), so in theory this change should be safe
and improve performance, but there are some known bugs in SMAPIv3 plugins currently.

So introduce a config file flag 'concurrent' for now, that defaults to false,
but that can be turned to 'true' for testing purposes.
When all SMAPIv3 concurrency bugs are believed to be fixed we can flip the default,
and eventually remove this flag once no more bugs are reported.
The configuration value is done as a global to simplify integrating intot he Lwt port,
instead of changing a lot of functions to thread through an argument.

This doesn't change the behaviour of xapi-storage-script in its default configuration.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…orosync3-msg

CP-49634: Add alerting for Corosync upgrade
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
If we want to reproduce a failure we need to know the exact commandline that was used.
Longer than 80 chars is not a problem, this is a logfile, and a truncated line is worse than a long line.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Non-running VMs' metrics are stored in the coordinator. When the coordinator is
asked about the metrics try to unarchive them instead of failing while trying
to fetch the coordinator's IP address.

This needs to force the HTTP method of the query to be POST

Also returns a Service Unavailable when the host is marked as Broken.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Forces users to use an address, instead of being implicit, this avoid the
underlying cause for the issue fixed in the previous commit: it allowed a
coordinator to call Pool_role.get_master_address, which always fails.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This makes the selection of the action obvious, previously the two booleans
made it hazy to understand the decision, and was part of the error why the
coordinator tried to get the coordinator address from the pool_role file (and
failed badly)

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Currently a List.assoc is used, which raises an unhandled exception.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Shell quoting changes in xen-api.git 65f152d
broke the dry-run functionality, as by quoting parameters in the way it was done
it meant the space separation was not properly handled to separate parameters
etc.

Signed-off-by: Alex Brett <alex.brett@cloud.com>
The xe-[backup,restore]-metadata scripts have cleanup logic designed to ensure
we do not leave any vbd objects etc behind.

This logic calls `vbd-unplug` with a 20s timeout, and then (regardless of the
result) allows up to 10s for any device specified in the VBD to disappear - if
it does not, it does not trigger a `vbd-destroy`.

This logic fails in the case where a VDI is attached to a VM running on the same
host, as the `device` field in the new VBD will be populated with the backend
device for the running VM. In this case, the `vbd-unplug` fails immediately (as
the vbd is not plugged because the original `vbd-plug` attempt fails as the VDI
is in use), but then we sit waiting for 10s for the device to disappear (which
is obviously does not), and then fail to trigger a `vbd-destroy`, leaving the
VBD behind.

Fix this by simply removing the wait for the device to disappear and always
attempting a `vbd-destroy`, as I am not aware of any situation where this
additional 10s wait will give any benefit given current behaviours.

Signed-off-by: Alex Brett <alex.brett@cloud.com>
This patch makes the feature use the debugfs utility, part of e2fsprogs. This
makes the system as a whole a heck of a lot better, if only because it won't be
able to parse XFS, ReiserFS or any of the many plugins of libfsimage.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Add a new option `-o` to xe-restore-metadata, which is used to distinguish
whether to allow use of legacy backup VDIs, or enforce only use of the new
format VDIs with known UUIDs.

Also modify xe-restore-metadata such that it no longer stops searching the
candidate list if only one VDI is found, but instead identifies all possible
backup VDIs. If more than one is found, and you are doing anything other than
listing the VDIs, the script will abort. This is to cover the case where a
malicious legacy format VDI is present - we will detect it and the expected
'real' backup VDI.

Modify xe-backup-metadata to always expect to use the deterministic UUID to
identify the VDI to add backups to, do not rely on the
`other-config:ctxs-pool-backup` property for identification in any way.

This is XSA-459 / CVE-2024-31144

Signed-off-by: Alex Brett <alex.brett@cloud.com>
- Quote a parameter
- Adjust how we check the returncode of some function calls to satifsy shellcheck
- Disable the warnings where we are explicitly relying on string splitting

Signed-off-by: Alex Brett <alex.brett@cloud.com>
This parameter was added in 7f1d315, but the
changes to always use the new metadata VDIs with known UUIDs means it is no
longer required, so remove it.

Signed-off-by: Alex Brett <alex.brett@cloud.com>
* Remove redundant parameter wiping

Removes ineffectual parameter wiping introduced by 6e24ca4.

Signed-off-by: Colin James <colin.barr@cloud.com>
…Unix/Lwt

It was a mix of Lwt and Unix code, which means that if the Unix call blocks the entire Lwt code blocks too.
This was only installed by the specfile in a -devel package.

`message-cli tail --follow` can be used to debug the IDL protocol instead.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Also add missing dependencies added recently

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Its implementation was identical, except for the use of time_limited_read in Threadext,
but the semantics is identical.

Use one well tested implementation instead of duplicating code.

One less function to convert to epoll.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…t_timed_read

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…poll4.0-drop

[epoll] CP-47536: Drop posix_channel and channel_helper: unused and a mix of …
…isten_p

CA-395512: process SMAPIv3 API calls concurrently
edwintorok and others added 27 commits July 23, 2024 23:30
This signature is completely unused.
We could instead generate a client.mli, but that is more complicated,
currently the client.mli it'd generate wouldn't be polymorphic enough.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Server.ml takes a while to compile, but most unit tests don't actually need it.
Reorganize Api_server into Api_server+Api_server_common, where the latter
suffices for unit tests.

'dune runtest' times are improved:
```
hyperfine --min-runs 2 'dune clean; dune runtest --cache=disabled' 'cd ../scm-prev; dune clean; dune runtest --cache=disabled'
Benchmark 1: dune clean; dune runtest --cache=disabled
  Time (mean ± σ):     103.491 s ±  1.596 s    [User: 374.464 s, System: 125.957 s]
  Range (min … max):   102.363 s … 104.620 s    2 runs

Benchmark 2: cd ../scm-prev; dune clean; dune runtest --cache=disabled
  Time (mean ± σ):     114.158 s ±  2.980 s    [User: 380.638 s, System: 134.558 s]
  Range (min … max):   112.051 s … 116.266 s    2 runs

Summary
  dune clean; dune runtest --cache=disabled ran
    1.10 ± 0.03 times faster than cd ../scm-prev; dune clean; dune runtest --cache=disabled
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
`sexpr` is now fully thread safe without having to use locks,
doesn't need to depend on threadext.

`gen_api_main` can use the external `uuidm` module directly,
without waiting for internal one to be built.

`dune-build-info` is only needed by xapi_version.

`xapi-stdext-unix` is not needed in `xapi-idl`

The sexplib ppx runtime also doesn't need to be linked in some libraries that do not use it anymore,
and where it is used it'll be automatically linked.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Xapi_version depends on Build_info which can change on every commit.
It is better to remove it from the dependencies of gen_api_main,
especially that gen_api_main is on the critical path for discovering more
dependencies.

The 'xapi_user_agent' constant got moved to Xapi_version.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
It'll be slower, but it can run a lot earlier in the build process.
Compiling the datamodels takes time, but compiling them for bytecode is faster.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Introduce a _minimal library, so we can start compiling server.ml earlier.

Build time reduced from 80s to:
```
Benchmark 1: dune clean; dune build --cache=disabled
  Time (mean ± σ):     67.081 s ±  0.190 s    [User: 326.847 s, System: 103.668 s
  Range (min … max):   66.946 s … 67.215 s    2 runs
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Remove most sleeps, and reduce test duration from 5 seconds to 1.
(If we do want to run a performance test we can increase these again)

Run just a basic test for 0.1 seconds instead of a performance test for 5s by
default.
(can still be tweaked by overriding SECS)

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Instead of every 5s. Speeds up testing, and may speed up startup somewhat.
And a connection try once every 0.5s won't create a lot of load on the system.
(If needed we could implement some form of exponential backoff here).

```
  Benchmark 1: dune clean; dune runtest --cache=disabled
  Time (mean ± σ):     97.642 s ±  0.933 s    [User: 354.132 s, System: 113.436 s]
  Range (min … max):   96.982 s … 98.302 s    2 runsi
```
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
These are errors in dune 3.15 and don't seem to be problematic

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Also make dune generate the opam metadata

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Only tests need it to generate crypto keys, but it's needed to create the
serial when signing certificates.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
We want to run these from 'quicktest', so make them available as libraries,
and add a _run.ml that would run them separately, just as before.
(running separately in the CI is better, because it can be parallelize)

No functional change.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Quicktest runs in Dom0, and the existing quickcheck tests run in the CI.
Some of these test as much the OCaml code, as its interaction with the system (e.g. behaviour of system calls).
So it is better to run these tests both in the CI and in Dom0.

We run these in long mode, to explore more randomly generated scenarios.

The seed can be controlled with QCHECK_SEED environment variable.
Similar to @StressTest it uses a random seed, instead of a fixed seed.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…i-to-quicktest

CP-50448: run quickcheck tests in XenRT
To ensure consistency and simplicity for both XS9 and XS8, this
change replaces bc command use to python.

Signed-off-by: Deli Zhang <deli.zhang@citrix.com>
…-50121

CP-50121: Remove bc package from XS9 dom0
Signed-off-by: Stephen Cheng <stephen.cheng@cloud.com>
@gangj gangj merged commit 67b28e0 into xapi-project:feature/py3 Jul 31, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants