-
Notifications
You must be signed in to change notification settings - Fork 292
Merge master to feature branch #6348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
changlei-li
merged 85 commits into
xapi-project:feature/guest-vm-service-aware
from
changlei-li:private/changleli/merge-master-to-feature
Mar 10, 2025
Merged
Merge master to feature branch #6348
changlei-li
merged 85 commits into
xapi-project:feature/guest-vm-service-aware
from
changlei-li:private/changleli/merge-master-to-feature
Mar 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds functionality to marshal and unmarshal `TraceContext` in the tracing library. Instead of passing only the `traceparent` in `debug_info`, the entire `TraceContext` is now passed as JSON. This change enables the transfer of baggage across xenopsd boundaries, improving tracing propagation and debugging capabilities. This should also enable later use of `baggage` as means of passing the thread classification between components. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Refresh the trace_context with the correct traceparent when creating a span with `start_tracing_helper` in `context.ml`. This ensures the tracing of the context has the correct parent. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
In Xapi_event, events are accumulated by folding over the set of tables associated with a subscriber's subscription record. These events are mostly accumulated as lists within a tuple. There is no analogue of functional record update for tuples in OCaml. This means that the separate accumulations have to cite values they will not update. By introducing records, we can only cite the fields we actually change. Signed-off-by: Colin James <colin.barr@cloud.com>
In event accumulation for event.from, the code uses a mutable variable to thread a value through event accumulation. However, this value itself is accumulated in the fold: it gets larger for each matching database event that matches a subscription. To avoid the complexity in effectively having a global, mutable variable, we drop it and make it more evident when it changes: it is changed when no events are accumulated (by grab_range). In the case that no events are accumulated, but the deadline hasn't been reached, the code tries to collect events again. It is during a retry that the last_generation needs to be bumped, as it defines the starting point by which to query the database for recent and deleted objects. In short, if no suitable events were gleaned from matching database object records since a given point, there's no point starting from there again. Signed-off-by: Colin James <colin.barr@cloud.com>
In order to provide event records to subscribers, we must convert the accumulated events of the form (table, objref, time) into event records. The process of doing this is simple for objects in the database. The only difference is that deletion events do not provide a snapshot (as the object has been deleted). To avoid repeating ourselves, we define an "events_of" function that accumulates event records. The function takes an argument that specifies whether an attempt to provide a snapshot should be performed. The reification of events associated with messages - which are not stored in the database - is untouched. This relies on a callback instated elsewhere. Signed-off-by: Colin James <colin.barr@cloud.com>
Further changes to turn tuples into records. Also partially uncurries `collect_events` to make its intended use as a fold more apparent. Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
To join a host into a pool with cluster enabled, the host must have one and only one IP configured on the joining cluster network. If not, after the host joinied the pool, GFS2 SR cannot be plugged on the joined host because an IP is required in the cluster network. Pool join in this scenario has been blocked in XenCenter, here we will block it inside xapi. Signed-off-by: Gang Ji <gang.ji@cloud.com>
Currently, users with read-only permissions can freely manipulate their own tasks (create, cancel, destroy, etc.). However, they cannot easily manipulate the "other_config" field. Some keys are specified in the datamodel as being writeable by subjects with the VM operator role. However, RBAC checking for keys was only ever implemented for the individual "add" and "remove" operations, not the "set_other_config" operation. This makes sense because the typical scenario is that the required writer role for an "other_config" field is more privileged than each of the individually-specified key-specific roles. In the case of tasks, the desired role for the set_other_config operation is the read-only role. However, we cannot simply demote the required "writer role" for the other_config field, because: 1) We must maintain key-based RBAC checks for the operation. For example, if a read-only user attempts to modify a task's other_config using set_other_config, the operation must fail if they've attempted to modify the value mapped to by some key that they are not privileged to use. 2) Key-based RBAC checking is special cased to "add_to" and "remove_from". You can attempt to ascribe key-related role restrictions to arbitrary messages but these will all fail at runtime, when invoked - because Rbac.check is special cased to expect to be able to extract out the key name from a "key" argument it receives, which is emitted for "add_to" and "remove_from". 3) There is an additional restriction that read-only users should not be able to modify arbitrary tasks, only their own. Both of these points require a custom implementation. To this end, we mark "other_config" as read-only (DynamicRO) and implement custom handlers for the "add_to", "remove_from", and "set" operations. In doing so, we implement a subset of the RBAC protection logic for keys. This custom implementation amounts to a relaxation of the usual RBAC rules: where "add_to" and "remove_from" (both purely destructive operations) cite a protected key (that they are not permitted to write), RBAC fails. In the custom implementation of "set_other_config", an under-privileged session can cite any key so long as their change is not destructive (it must preserve what is already there). Signed-off-by: Colin James <colin.barr@cloud.com>
This allows an xs_trace.exe to be run on any linux machine, whereas previously the use of forkexecd meant that it had to be run on a XS host. Signed-off-by: Steven Woods <steven.woods@citrix.com>
hcp_nss is a nss module to override the uid/gid of pooladmin when they ssh into dom0, as dom0 only support one single user However, the name wants to be updated to nss_override_id to reflect its usage Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…roject#6293) This allows an xs_trace.exe to be run on any linux machine, whereas previously the use of forkexecd meant that it had to be run on a XS host. It also allows the logic to be simplified (json and ndjson can be handled in the same way).
A code clarification effort, consisting of: - Accumulate lists of event-related objects within a record, instead of a tuple. Using a record permits functional record update, which means we avoid citing lists we have not changed. - Remove `last_generation`: this mutable variable is updated in a few places and complicates the code. Instead of relying it on being in scope (and mutated in other places), we explicitly pass it in and then thread the update to its value through the retry loop. - Factor out a routine to convert `(table, obj, time)` entries into event records, defining the accumulation of `add`, `mod`, and `del` events in terms of it: message events remain special cased because their contents are not in the database. This avoids duplicated code.
) Currently, users with read-only permissions can freely manipulate their own tasks (create, cancel, destroy, etc.). However, they cannot easily manipulate the `other_config` field. Some keys are specified in the datamodel as being writeable by subjects with the VM operator role. However, RBAC checking for keys was only ever implemented for the individual "add" and "remove" operations, not the `set_other_config` operation. This makes sense because the typical scenario is that the required writer role for an "other_config" field is more privileged than each of the individually-specified key-specific roles. In the case of tasks, the desired role for the `set_other_config` operation is the read-only role. However, we cannot simply demote the required "writer role" for the `other_config` field, because: 1) We must maintain key-based RBAC checks for the operation. For example, if a read-only user attempts to modify a task's `other_config` using `set_other_config`, the operation must fail if they've attempted to modify the value mapped to by some key that they are not privileged to use. 2) Key-based RBAC checking is special cased to `add_to` and `remove_from`. You can attempt to ascribe key-related role restrictions to arbitrary messages but these will all fail at runtime, when invoked - because `Rbac.check` is special cased to expect to be able to extract out the key name from a `key` argument it receives, which is emitted for `add_to` and `remove_from`. 3) There is an additional restriction that read-only users should not be able to modify arbitrary tasks, only their own. Both of these points require a custom implementation. To this end, we mark `other_config` as read-only (DynamicRO) and implement custom handlers for the `add_to`, `remove_from`, and `set` operations. In doing so, we implement a subset of the RBAC protection logic for keys. This custom implementation amounts to a relaxation of the usual RBAC rules: where `add_to` and `remove_from` (both purely destructive operations) cite a protected key (that they are not permitted to write), RBAC fails. In the custom implementation of `set_other_config`, an under-privileged session can cite any key so long as their change is not destructive (it must preserve what is already there). --- The motivation for this change is in fusing serial `add_to_other_config` and `remove_from_other_config` operations into a single `set_other_config` call. I've included (possibly excessive) commentary within the code as the RBAC checking stuff is not touched a lot. This change may not be desirable overall, as it has low impact.
In an effort to make the logic clearer, we restructure the part(s) of Xapi_pool_helpers responsible for determining which operations are "valid" (i.e. become members of allowed_operations). The previous code is somewhat tedious to understand because it is unconditionally ineffective - the logical delineation of parts of the code is implicit, as the code computes the valid operation table in order, but many of the operations will have no effect (as later code to populate an operation's entry in the validity table do nothing). To try and simplify matters, we add some level of static partitioning of "blocking" and "waiting" operations (using separate polymorphic variants and coercions to widen into a type comprising all operations). Then, we replace loops with "find first" computations. The current logic should be clearer. It is roughly as follows: To compute the "valid" operations: - Start by assuming all operations are valid. We explicitly map each operation to an "Allowed" constructor to signify this. Though, technically, full coverage of the cases would guarantee that absence from this table implies the associated operation is valid - however, we maintain the old concern and maintain validity entries as a tri-state value (Unknown, Allowed, and Disallowed). - Determine the current operations reported by the pool object. At present, every operation is statically partitioned into "blocking" or "waiting" operations - which are both handled differently, with a "blocking" operation taking highest precedence - If there's an operation in current operations that is a blocking operation, this takes precedence. We cover all operation cases as follows: (1) blocking operations get marked as being invalid and cite the reason associated with the blocking operation discovered in the current operations set (which is an operation-specific "in progress" error). Then, all waiting operations get marked as invalid, citing a generic "in progress" error (unrelated to any specific operation). - If there's no blocking operation in current operations, but there is a waiting operation, we map all operations to a generic "in progress" error. - If there is no blocking or waiting operation in current operations (which, at present, is to say that current operations is empty), then we invalidate entries based on specific, hardcoded, invariants. For example, if HA is enabled on the pool, we invalidate the `ha_enable` operation on the pool object (with the reason explicitly explaining that you HA is already enabled). In future, we could consider encoding the relations between operations (and related object state) declaratively, such that we can either automatically generate such code (and test it alongside invariants, encoded as Prolog-esque rules) or use an incremental computation framework to automatically recompute the allowed operations based on changes in ambient pool state. Signed-off-by: Colin James <colin.barr@cloud.com>
* We currently have a mock implementation for driver-tool from the dmv-utils package. Adjust the command line argument structure to the real implementation to make the switch over easy. * Install the mock implementation if the real implementation is not in place. We might want to remove this later. Currently xapi.spec does not list the dependency. Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
`Remote.receive_finalize2` is called at the end of SXM to clean things up and compose the base and leaf images together. The compose operation should only be called while the VDI is deactivated. Currently a thread is created to call `receive_finalize2`, which could caused problems where the VM itself gets started while the `receive_finalize2`/`VDI.compose` is still in progress. This is not a safe operation to do. The fix here is to simply remove the thread and make the whole operation sequential. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Adds a simple parser for Xapi type expressions that is used to rewrite the types shown in the XenAPI class reference to include links to relevant documentation. Signed-off-by: Colin James <colin.barr@cloud.com>
Adds a simple parser for Xapi type expressions that is used to rewrite the types shown in the XenAPI class reference to include links to relevant documentation. ---  --- The parser is structured in the form of a Pratt parser. This may seem like overkill, but it keeps the door open to extension. Also, the parser must work with limited information (it has no knowledge of XenAPI object names). It works by noting that object types are always (?) suffixed by a type constructor, e.g. `VM ref` - therefore, it assumes any prefix form must be a valid class name, then subsequent left denotations take the left as a type parameter (for which all are unary, except `map` which has an alternative syntax that is special cased). Currently, the only interesting parts of the "rendered" type are: - Enum names become links that should scroll and temporarily highlight (flash) the relevant details. - Object names become links to their relevant page in the documentation. However, other structure is retained, such as where builtin (primitive) types are (e.g. int, bool, string, etc.), constructors names (ref, set, option, etc.). In future, these could link to relevant portions of a new article that explains all the types (for example, the format of datetime is perhaps non-obvious to someone reading the XenAPI reference pages).
hcp_nss is a nss module to override the uid/gid of pooladmin when they ssh into dom0, as dom0 only support one single user However, the name wants to be updated to nss_override_id to reflect its usage
dnf has been updated to dnf5 during clean rebuild event for XS9, The command line interface changed and we need to update accordingly https://dnf5.readthedocs.io/en/latest/changes_from_dnf4.7.html Following commands update are necessary for DNF5 update - updateinfo -> advisory - config-manager "--save --setopt" -> "setopt" - reposync removed "--downloadcomps" as it is the default behavior - repoquery format output no longer has line breaker, add `\n` in format argument to bring it back and align with YUM - repoquery drop "-a" option as it is default behavior Also improve the code with strong type checking - Available|Updates for two kinds of updateinfo instead of string Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…ges = 0 (xapi-project#6313) Minor improvement - more detail about `wait_xen_free_mem` (called for waiting for scrubbing): Mention that it terminates in case free memory is smaller than required memory and `physinfo.scrub_pages = 0` (will never have enough memory for this VM).
dnf has been updated to dnf5 during clean rebuild event for XS9, The command line interface changed and we need to update accordingly https://dnf5.readthedocs.io/en/latest/changes_from_dnf4.7.html Following commands update are necessary for DNF5 update - updateinfo -> advisory - config-manager "--save --setopt" -> "setopt" - reposync removed "--downloadcomps" as it is the default behavior - repoquery format output no longer has line breaker, add `\n` in format argument to bring it back and align with YUM - repoquery drop "-a" option as it is default behavior Also improve the code with strong type checking - Available|Updates for two kinds of updateinfo instead of string
* We currently have a mock implementation for driver-tool from the dmv-utils package. Adjust the command line argument structure to the real implementation to make the switch over easy. * Install the mock implementation if the real implementation is not in place. We might want to remove this later. Currently xapi.spec does not list the dependency.
At XCP-ng, we are working on overcoming the 2TiB limitation for VM disks while preserving essential features. However, we currently cannot export a VDI to a Qcow2 file, nor import one. The purpose of this design proposal is to outline a solution for implementing VDI import/export in Qcow2 format. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
…ion (xapi-project#6322) host.emergency_reenable_tls_verification can only be used after TLS verification is enabled in pool, raise an error if otherwise.
The vm parameter passed to SMAPIv2 calls is needed for the storage backend to know which domain slice it should operate on. For SXM, since there is no specific VM, we use the convention "MIR<vm-prefix>", where the <vm-prefix> is the first five characters of the hash of the vm uuid. This is not sufficient for VMs with multiple VDIs attached to it as they would result in the same vm parameter. So use a new convention "MIR<vm-prefix><vdi-prefix>" so we distinguish VDIs during migration. These vm parameters cannot be too long though as otherwise the storage backend will not accept them.
Removes all default permissions, and only adds needed permissions to the workflows that need it. - read contents: allows reading the repository, most workflows need this - write contents: allows to write to the repository. Needed to create releases - write pull-requests: allows to manipulate pull requests. Unfortunately this is needed to comment on PRs - read actions: allows reading the state of actions Note that because all workflows remove all permissions at the top level the current default permissions don't take any effect.
There is also a 'make analyze' to run this locally. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Macros beginning with `_` are reserved by the language. Fixes a CodeChecker warning. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes a warning reported by codechecker Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Analyzer can't figure out that 'rc' is set on 'fd < 0', and we never try to use 'size_in_bytes'. Appears to be a limitation in the analyzer, it doesn't realize that `fd < 0` implies that `rc` is set to `NOT_IMPLEMENTED`, which raises an exception, so `caml_copy_int64` is never reached. But the logic there is quite hard to follow, so better be explicit, and ensure `size_in_bytes` is always initialized. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Add documention for the various PEM files found in a XenServer installation. Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Add documention for the various PEM files found in a XenServer installation.
Similar to the static analyzer introduced in xha, but for OCaml C stubs. This uses my https://ocaml.org/p/dune-compiledb/latest to product a `compile_commands.json` for the C stubs out of the `dune` rules (which can also be useful if you want to use `clangd` and get some editor integration about compiler warnings). This requires installing enough of the build dependencies to be able to run `dune` successfully (perhaps in the future that restriction can be removed). Caching is used though, and we only need to install the build deps when `dune` files change, otherwise we can reuse a cached `compile_commands.json`. Static analyzers, like CodeChecker support reading `compile_commands.json` and invoking static analyzers with the appropriate flags to preprocess, and analyze the C source code. We use `clang`, `clang-tidy` and `cppcheck` as the default analyzers, although more analyzers could be added in the future (CodeChecker supports converting `gcc -fanalyzer` output for example. GCC also supports emiting SARIF format directly, but github cannot parse it, because it doesn't implement the full SARIF spec). At the end we convert the results back to the standard SARIF format that Github also supports for its code scanning results, which will make it automatically add comments on PRs that introduce *new* bugs, without necessarily gating on them. I fixed some of the most obvious warnings, and suppressed some minor ones that we cannot fix (where the warning is caused by a Xen or OCaml header). More warnings can be skipped by adding to `.codechecker.json` if needed. So far it seems to have found a file descriptor leak in `unixpwd.c` on an error path, but I haven't gone through all the reports in detail yet.
These functions have a more relevant name than the previous ones, especially with the advent of OCaml 5.0 the runtime lock becomes more important and programmers must be aware of it. These functions were introduced in OCaml 3.12. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
…t#6341) These functions have a more relevant name than the previous ones, especially with the advent of OCaml 5.0 the runtime lock becomes more important and programmers must be aware of it. These functions were introduced in OCaml 3.12.
The runtime lock is not being held when String_val(...) is used on OCaml values. Instead, we duplicate the strings whilst holding the lock, then use (and free) the values without the lock. Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Gerald Elder-Vass <gerald.elder-vass@cloud.com>
The runtime lock is not being held when `String_val(...)` is used on OCaml values. Instead, we duplicate the strings whilst holding the lock, then use (and free) the values without the lock. --- Thanks to @last-genius for spotting this. It's a bad pattern used in other places with less consequence (e.g. `Int_val` is unboxed and cannot be relocated), but they should all be changed - I just wanted to quickly address this in particular, since it's my mistake.
…i-project#6328) XenServer does not protect `/etc/rsyslog.d/xenserver.conf` when updating the rsyslog package (or the xenserver-release package on CH8.2CU1), as a result it is replaced with a fresh instance during updates which will lose customer configurations (if they have any). This script is primarily invoked via XenCenter when providing a new host for log forwarding. An update will be made to the XenServer rsyslog package to make it clear that `/etc/rsyslog.d/xenserver.conf` should not be edited by the customer. I've tested this by copying the updated script onto my DT box and using XenCenter to add a new log server - noting the new `/etc/rsyslog.d/custom.conf` file and the untouched `/etc/rsyslog.d/xenserver.conf` file.
…6334) An `if ... then raise exn;`was misread and make the code after impossible to execute, when that was not the intention. Remove all the ignore_<type> functions from stdext: a plain ignore with a type annotation can replace these. We should start using those for all ignores (there are too many of them, and can't be easily automated to do it in this PR) Passes internal tests: 213465 (one failure due to the recent vlan + clustering issue)
Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
DCO failed, but it was introduced in master, see #6318, suggest ignoring it and merge
|
gangj
approved these changes
Mar 10, 2025
Yeah, I think it is ok to ignore and proceed. |
minglumlu
approved these changes
Mar 10, 2025
7febfd2
into
xapi-project:feature/guest-vm-service-aware
17 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Conflict solved:
Commit b6ced44 is new for updating last_known_schema_hash and datamodel lifecycle