-
Notifications
You must be signed in to change notification settings - Fork 292
Sync master to feature/configure-ssh-phase3 #6538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
psafont
merged 263 commits into
xapi-project:feature/configure-ssh-phase3
from
LunfanZhang:feature/ssh-p3-merge-master
Jun 17, 2025
Merged
Sync master to feature/configure-ssh-phase3 #6538
psafont
merged 263 commits into
xapi-project:feature/configure-ssh-phase3
from
LunfanZhang:feature/ssh-p3-merge-master
Jun 17, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These shouldn't be necessary anymore, since no exception is raised. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We should only log_backtrace if we are the final handler. The exception is raised here, so the caller will have a chance to log it. This was also inconsistent: some *_interface logged the backtrace, and others didn't. In theory there is a chance that the caller is buggy and doesn't log the correct backtrace. But if we simplify the places that call the Backtrace module, we'll have fewer chances of that going wrong. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
response_internal_error already calls Backtrace.is_important in the correct place, and logs the exception. There is no need to do that a 2nd time in the caller. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This shows how brittle the current Backtrace API is, this was missing from a lot of places. We have some better alternatives (`with_backtraces`, or a `try_with` function) that'd guarantee that `important` is always called in the right place, but that would be a more invasive change, which will be done in a followup commit. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
xapi-project#6432) Windows PV drivers do not store their version information into `drivers/{xeneventchn,xenvbd,xennet}` xenstore keys since 2015, see: * PV drivers commit `784af16810d705ba2bc02bab6ac93b24119f886c (Publish distribution information to xenstore)` https://xenbits.xen.org/gitweb/?p=pvdrivers/win/xenbus.git;a=commit;h=784af16810d705ba2bc02bab6ac93b24119f886c * Xen commit `71e64e163b2dae7d08f7d77ee942749663f484d5 (docs: Introduce xenstore paths for PV driver information)` https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=71e64e163b2dae7d08f7d77ee942749663f484d5 Instead it is stored like this: ``` drivers/0 = "XenServer XENBUS 9.1.9.105 " drivers/1 = "XenServer XENVBD 9.1.8.79 " drivers/2 = "XenServer XENVIF 9.1.12.101 " drivers/3 = "XenServer XENIFACE 9.1.10.87 " drivers/4 = "XenServer XENNET 9.1.7.65 " ``` Modify `xapi_guest_agent` to list such entries under `drivers/` and present version information for each driver.
This was not set properly when it was first introduced in xapi-24.15.0 Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
…mportant part of the backtrace (xapi-project#6430) Before/after can be seen clearly in the newly added unit test: * previously line 8 was not printed anywhere, even though that is the actual place the exception is raised * the whole backtrace was printed on a single line, which can be quite annoying After: * line 8 is present * the correct backtrace printer is used that prints each stack frame on a separate line with a counter ```diff --git a/ocaml/libs/log/test/log_test.t b/ocaml/libs/log/test/log_test.t index f25ee70..fbdeebf 100644 --- a/ocaml/libs/log/test/log_test.t +++ b/ocaml/libs/log/test/log_test.t @@ -1,4 +1,9 @@ $ ./log_test.exe | sed -re 's/[0-9]+T[0-9:.]+Z//' - [|debug||0 |main|log_test.ml] Raised at Xapi_stdext_pervasives__Pervasiveext.finally in file \"ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml\", line 39, characters 6-15\nCalled from Dune__exe__Log_test.(fun) in file \"ocaml/libs/log/test/log_test.ml\", line 15, characters 4-55\n + [|error||0 |main|backtrace] Raised Invalid_argument("index out of bounds") + [|error||0 |main|backtrace] 1/4 log_test.exe Raised at file ocaml/libs/log/test/log_test.ml, line 8 + [|error||0 |main|backtrace] 2/4 log_test.exe Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 + [|error||0 |main|backtrace] 3/4 log_test.exe Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39 + [|error||0 |main|backtrace] 4/4 log_test.exe Called from file ocaml/libs/log/test/log_test.ml, line 15 + [|error||0 |main|backtrace] ```
The `expected_votes` field in corosync represents the number of hosts that is expected by the cluster stack. In the context of corosync, this is the same as the number of hosts as in the corosync.conf file*. This is a useful field to expose to the user so that they can see how many nodes actually are expected. We also have `Cluster_host` object, which represents xapi's view of what nodes should be in the cluster, but that might not be identical to corosync's view, especially when a host is disabled, but is still left in the list of Cluster_host objects. Although one could argue that we could infer this `expected_votes` field from the number of enabled Cluster_hosts, it might still be useful to get this information directly from corosync. *: there are ways in corosync to make one host cast multiple votes, but that feature is not used. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
The `expected_votes` field in corosync represents the number of hosts that is expected by the cluster stack. In the context of corosync, this is the same as the number of hosts as in the corosync.conf file*. This is a useful field to expose to the user so that they can see how many nodes actually are expected. We also have `Cluster_host` object, which represents xapi's view of what nodes should be in the cluster, but that might not be identical to corosync's view, especially when a host is disabled, but is still left in the list of Cluster_host objects. Although one could argue that we could infer this `expected_votes` field from the number of enabled Cluster_hosts, it might still be useful to get this information directly from corosync. *: there are ways in corosync to make one host cast multiple votes, but that feature is not used.
This function updates the snapshot related db fields after the storage migration. There is no need to leave this in the storage layer as xapi-storage-script will not be able to access xapi db. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Move this to storage_utils.ml since it is used by storage_smapiv1.ml and storage_mux.ml Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Extract common logic on finding vdi_info given vdi, and also add a parameter to specify where to find the VDI (locally or remotely). Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Move the update_snapshot_info_dest to storage mux as this function just does db operations. Also rescan the SR after updaing the content_id during SXM, so that the latest content_id can be reflected in the returned vdi_info, which gets used later on in `update_snapshot_info`
These will be used when mirroring a VDI that is on a SMAPIv3 SR. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Adding a few more states that will be tracked by the send_state and receive_state. These will be used later on for SMAPIv3 outbound migration. We do need to give them a default value in case the deserialisation of the on-disk copy does not have these values. These are not used yet, so no functional change. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
These functions will not be exposed in storage_migrate.ml any more. `receive_start` is kept in the mux for backwards compatability reasons, and will go straight to the SMAPIv1 implemenetation. `receive_start2` calls will not be directed to the storage_mux any more, instead it will be multiplexed by the new layer in storage_migrate. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Previously `MIRROR.receive_start2` is called as a remote function, i.e. `Remote.DATA.MIRROR.receive_start2` and it will be forwarded to the destination host and multiplexes based on the destination SR type. This is inconvenient as what `receive_start2` should do is more dependent on what the source SR type is. For example, if the source SR is using SMAPIv1, then `receive_start2` needs to tell the destination host to create snapshots VDIs, while this is not necessary if the source SR type is of SMAPIv3. Hence instead of calling `Remote.receive_start2`, call each individual functions inside `receive_start2` remotely, such as `Remote.VDI.create`, and these SMAPIv2 functions will still be multiplexed on the destiantion side, based on the destination SR type. And this is indeed what we want, imagine a case where we are migrating v1 -> v3, what we want is still create a snapshot VDI, but on the v3 SR. This does mean that the state tracking, such as `State.add`, which was previously tracked by the destination host, now needs to be tracked by the source host. And this will affect a number of other `receive_` functions such as `receive_finalize2` and `receive_cancel2`, which are updated accordingly. For backwards compatability reasons, we still need to preserve `receive_start` which might still be called from an older host trying to do a v1 -> v1 migration. And this is done by making sure that the SMAPIv2 done inside `receive_start` are all local, while the `receive_start` call itself is remote. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
…ct#6434) Continuing on xapi-project#6423 This PR changes the way `receive_start2` is called, and it can now multiplex based on the source SR and destination SR type. Note that `receive_start2` is introduced for inbound SXM for SMAPIv3, which is flagged off by default, so no need to worry about backwards compatability. Also introduces a couple of new types for SMAPIv3 migration, but they are uncurrently unused. There should still be no functional change. More to come...
Since we are now supporting qcow file `tap-ctl list` can return strings like: - "1564848 0 0 qcow2 /var/run/sr-mount/..." Without this patch the type "qcow2" is unknown and xcp-rrdd-iostat generates an error like: returned a line that could not be parsed. Ignoring This patch fixes the issue. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
Since XCP-ng is now supporting qcow files `tap-ctl list` can return string like: - "1564848 0 0 qcow2 /var/run/sr-mount/..." Without this patch the type "qcow2" is unknown and xcp-rrdd-iostat generates an error like: `returned a line that could not be parsed. Ignoring` This patch fixes the issue by allowing qcow2 type.
…time reduction later (xapi-project#6362) Split the plug and unplug atomics into separate attach/activate and deactivate/detach atomics. This is gated by two separate flags and both are switched off by default. When switched on, this should still be a no-op by calling them in serial where plug/unplug are currently used, but will allow us to later use these atomics separately elsewhere for optimisation. Opening this as a draft PR as plug/unplug is fundamental so the logic needs to be watertight before we consider merging.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
On IO and timeout errors do not remove the device, but try to reconnect. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
xapi-project#6183) Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2). Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475 (We are going to optimize get_record separately so it doesn't go through serialization) This still requires validation and measurements (e.g. using @contificate 's benchmark)
…stent connections (xapi-project#6444) This was found during a previous stress test. Without a persistent connection the kernel might disconnect from the backing qemu/tapdisk in Dom0 when I/O takes too long, and that prevents all future I/O to that device from working, not just the one that had the speed hickup. Since our NBD connection is local, mark it as persistent, so that it tries to reconnect on timeout.
The regex removed parameters with a particular suffix instead of checking for the whole name. For example, after providing a uuid parameter to xe vif-move, network-uuid would no longer be suggested: ``` $ xe vif-move <TAB><TAB> network-uuid= uuid= $ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df <TAB><TAB> ``` This is fixed now: ``` $ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df <TAB><TAB> $ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df network-uuid= ``` Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
There were several optional boolean parameters that were checked and used in cli_operations but were not included in cli_frontend (therefore would not be shown in suggestions or 'xe help command'). Add these to cli_frontend. is_unique is a ... unique case because it does not follow the style of the CLI parameters (which use dashes, not underscores, to separate words), so add 'is-unique' to cli_frontend but handle both in cli_operations. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This code hasn't been used for 10+ years. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…oject#6503) Instrument xenops vm non-atomic functions. Instruments: - `VM.add`, - `VM.stat`, - `VM.exists`, - `VM.list`. Instruments `switch_rpc` according to OpenTelemetry standard on instrumenting rpc calls. - `server.address` is the name of the message queue. Intruments sending the message on a queue according to OpenTelemetry standard on instrumenting messaging. - `destination` is the name of the message queue. `Tracing.with_tracing` now accepts an optional argument to set the Span Kind. 
Maintenance mode is entered by running Host.evacuate, followed by promoting a new pool coordinator and shutting down XAPI. We only export spans every 30s, so we may miss exporting the span for Host.evacuate. Ensure that we at least trigger the export when XAPI is about to shutdown. Do not wait for the export to finish, because this could take a long time (e.g. when exporting to a remote Jaeger instance). After this change I now see Host.evacuate properly in the exported trace. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
json_reformat cannot handle newline delimited json, it is easier if we have a command to reformat it ourselves. This can be useful when debugging why a trace is missing elements. Traces are stored as newline-delimited JSON in /var/log/dt/zipkinv2/json, however json_reformat cannot process them directly, and the lines can be very long and difficult to read otherwise. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
xapi-project#6525) Soon after Host.evacuate XAPI could be restarted (e.g. on coordinator promotion). But we only export traces every 30s, so we lose the spans from the last 30s, including the toplevel Host.evacuate span (which although long running is only emitted on completion). After this change I'm now able to see Host.evacuate and all the migrate calls in the exported distributed trace.
Use the numa-compat argument to be able to override the default numa placement using xenopsd.conf option. This allows to change the default placement when building a package This patch reverts some of the changes in e6f94be Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Adds the missing dependency to stdext-threads for testing Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
unfortunately threadext uses this log package, define a with_lock in xapi-log tests to avoid using the former. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
SR-IOV (Single Root I/O Virtualization) is a technology that allows a single physical PCI Express (PCIe) device, such as a network adapter, to be shared efficiently among multiple virtual machines (VMs) or containers. It achieves this by creating Virtual Functions (VFs) that act as lightweight PCIe functions, each assigned to a VM, while the Physical Function (PF) remains responsible for managing the device. Add check in Sysfs.is_physical - check if there is "physfn" in the device dir to filter out VF, then XAPI will not create PIF object for VF during scan. Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
…oject#6488) Currently, xapi-storage-script uses the presence/absence of a smapi observer config file to determine whether it should create traces. This only happens on startup which means smapiv3 traces will often not be created when they should be (e.g. if the smapi component is enabled after startup, tracing for smapiv3 will not be) This commit updates the Smapi Observer forwarder to use an RPC client to send messages to xapi-storage-script, updating it on any relevant changes to the Observer. This is done in a generic way so that any future components could use also listen to this queue to call its own Observer functions. Also move the Observer RPC declarations to a common file to reduce code duplication and make some debug logs more helpful.
) Use the numa-compat argument to be able to override the default numa placement using xenopsd.conf option. This allows to change the default placement when building a package This patch reverts some of the changes in e6f94be
SR-IOV (Single Root I/O Virtualization) is a technology that allows a single physical PCI Express (PCIe) device, such as a network adapter, to be shared efficiently among multiple virtual machines (VMs) or containers. It achieves this by creating Virtual Functions (VFs) that act as lightweight PCIe functions, each assigned to a VM, while the Physical Function (PF) remains responsible for managing the device. Add check in Sysfs.is_physical - check if there is "physfn" in the device dir to filter out VF, then XAPI will not create PIF object for VF during scan.
…api-project#6513) `Host.cpu_info` list no longer contains a value associated with a "features" key, but the CLI implementation was hardcoded to expect it. As a result, it would show an unhelpful error like: ``` The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem. message: INTERNAL_ERROR: [ Not_found ] ``` Instead use the `cpu_info_features` keys from xapi-consts (after moving these keys from xapi-globs first). Add the pool version of the command. Additionally document their output format: ``` # xe help host-get-cpu-features command name : host-get-cpu-features reqd params : optional params : uuid description : Prints a hexadecimal representation of the host's physical-CPU features for PV and HVM VMs. features_{hvm,pv} are "maximum" featuresets the host will accept during migrations, and features_{hvm,pv}_host will be used to start new VMs. # xe help pool-get-cpu-features command name : pool-get-cpu-features reqd params : optional params : description : Prints a hexadecimal representation of the pool's physical-CPU features for PV and HVM VMs. These are combinations of all the hosts' policies and are used when starting new VMs in a pool. # xe host-get-cpu-features uuid=7f566729-0ee7-47c4-853d-2c5f3a195ad4 features_pv : 1fc9cbf5-f6f83203-2991cbf5-00000123-00000001-000c0b39-00000000-00000100-00001000-ac000c00-00000000-00000000-00000000-00000000-00000000-00000000-1c020004-00000000-00000000-00000000-00000000-00000000 features_hvm: 1fcbfbff-f7fa3223-2d93fbff-00000523-00000001-001c0fbb-00000000-00000100-00101000-bc000c00-00000000-00000000-00000000-00000000-00000000-00000000-1c020004-00000000-00000000-00000000-00000000-00000000 features_pv_host: 1fc9cbf5-f6d83203-28100800-00000121-00000001-000c0b29-00000000-00000000-00001000-ac000400-00000000-00000000-00000000-00000000-00000000-00000000-0c000000-00000000-00000000-00000000-00000000-00000000 features_hvm_host: 1fcbfbff-f7fa3203-2c100800-00000121-00000001-001c0fab-00000000-00000000-00101000-bc000400-00000000-00000000-00000000-00000000-00000000-00000000-0c000000-00000000-00000000-00000000-00000000-00000000 # xe pool-get-cpu-features features_pv_host : 1fc9cbf5-f6d83203-28100800-00000121-00000001-000c0b29-00000000-00000000-00001000-ac000400-00000000-00000000-00000000-00000000-00000000-00000000-0c000000-00000000-00000000-00000000-00000000-00000000 features_hvm_host: 1fcbfbff-f7fa3203-2c100800-00000121-00000001-001c0fab-00000000-00000000-00101000-bc000400-00000000-00000000-00000000-00000000-00000000-00000000-0c000000-00000000-00000000-00000000-00000000-00000000 ``` ==== Tested manually
And remove the testing dependency to threadext to avoid a circular dependency
BengangY
approved these changes
Jun 17, 2025
changlei-li
approved these changes
Jun 17, 2025
0ac0908
into
xapi-project:feature/configure-ssh-phase3
16 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge master to feature branch