Skip to content

Merge master to feature branch #6348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

changlei-li
Copy link
Contributor

Conflict solved:

$ git diff
diff --cc ocaml/idl/schematest.ml
index b534f108a,255e094e1..000000000
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@@ -3,7 -3,7 +3,11 @@@ let hash x = Digest.string x |> Digest.
  (* BEWARE: if this changes, check that schema has been bumped accordingly in
     ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)

++<<<<<<< HEAD
 +let last_known_schema_hash = "34390a071f5df0fac8dcf9423a9111ae"
++=======
+ let last_known_schema_hash = "ad67a64cd47cdea32085518c1fb38d27"
++>>>>>>> master

Commit b6ced44 is new for updating last_known_schema_hash and datamodel lifecycle

GabrielBuica and others added 30 commits February 12, 2025 21:58
Adds functionality to marshal and unmarshal `TraceContext` in the tracing
library.
Instead of passing only the `traceparent` in `debug_info`, the entire
`TraceContext` is now passed as JSON.

This change enables the transfer of baggage across xenopsd boundaries,
improving tracing propagation and debugging capabilities. This should
also enable later use of `baggage` as means of passing the thread
classification between components.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Refresh the trace_context with the correct traceparent when creating a
span with `start_tracing_helper` in `context.ml`.

This ensures the tracing of the context has the correct parent.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
In Xapi_event, events are accumulated by folding over the set of tables
associated with a subscriber's subscription record. These events are
mostly accumulated as lists within a tuple.

There is no analogue of functional record update for tuples in OCaml.
This means that the separate accumulations have to cite values they will
not update. By introducing records, we can only cite the fields we
actually change.

Signed-off-by: Colin James <colin.barr@cloud.com>
In event accumulation for event.from, the code uses a mutable variable
to thread a value through event accumulation. However, this value itself
is accumulated in the fold: it gets larger for each matching database
event that matches a subscription.

To avoid the complexity in effectively having a global, mutable
variable, we drop it and make it more evident when it changes: it is
changed when no events are accumulated (by grab_range). In the case that
no events are accumulated, but the deadline hasn't been reached, the
code tries to collect events again. It is during a retry that the
last_generation needs to be bumped, as it defines the starting point by
which to query the database for recent and deleted objects. In short, if
no suitable events were gleaned from matching database object records
since a given point, there's no point starting from there again.

Signed-off-by: Colin James <colin.barr@cloud.com>
In order to provide event records to subscribers, we must convert the
accumulated events of the form (table, objref, time) into event records.

The process of doing this is simple for objects in the database. The
only difference is that deletion events do not provide a snapshot (as
the object has been deleted). To avoid repeating ourselves, we define an
"events_of" function that accumulates event records. The function takes
an argument that specifies whether an attempt to provide a snapshot
should be performed.

The reification of events associated with messages - which are not
stored in the database - is untouched. This relies on a callback
instated elsewhere.

Signed-off-by: Colin James <colin.barr@cloud.com>
Further changes to turn tuples into records.

Also partially uncurries `collect_events` to make its intended use as a
fold more apparent.

Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
To join a host into a pool with cluster enabled, the host must have one
and only one IP configured on the joining cluster network.

If not, after the host joinied the pool, GFS2 SR cannot be plugged on
the joined host because an IP is required in the cluster network.

Pool join in this scenario has been blocked in XenCenter, here we will
block it inside xapi.

Signed-off-by: Gang Ji <gang.ji@cloud.com>
Currently, users with read-only permissions can freely manipulate their
own tasks (create, cancel, destroy, etc.). However, they cannot easily
manipulate the "other_config" field. Some keys are specified in the
datamodel as being writeable by subjects with the VM operator role.

However, RBAC checking for keys was only ever implemented for the
individual "add" and "remove" operations, not the "set_other_config"
operation. This makes sense because the typical scenario is that the
required writer role for an "other_config" field is more privileged than
each of the individually-specified key-specific roles.

In the case of tasks, the desired role for the set_other_config
operation is the read-only role. However, we cannot simply
demote the required "writer role" for the other_config field, because:

1) We must maintain key-based RBAC checks for the operation. For
   example, if a read-only user attempts to modify a task's other_config
   using set_other_config, the operation must fail if they've attempted
   to modify the value mapped to by some key that they are not
   privileged to use.

2) Key-based RBAC checking is special cased to "add_to" and
   "remove_from". You can attempt to ascribe key-related role
   restrictions to arbitrary messages but these will all fail at
   runtime, when invoked - because Rbac.check is special cased to expect
   to be able to extract out the key name from a "key" argument it
   receives, which is emitted for "add_to" and "remove_from".

3) There is an additional restriction that read-only users should not be
   able to modify arbitrary tasks, only their own.

Both of these points require a custom implementation. To this end, we
mark "other_config" as read-only (DynamicRO) and implement custom
handlers for the "add_to", "remove_from", and "set" operations. In doing
so, we implement a subset of the RBAC protection logic for keys.

This custom implementation amounts to a relaxation of the usual RBAC
rules: where "add_to" and "remove_from" (both purely destructive
operations) cite a protected key (that they are not permitted to write),
RBAC fails. In the custom implementation of "set_other_config", an
under-privileged session can cite any key so long as their change is not
destructive (it must preserve what is already there).

Signed-off-by: Colin James <colin.barr@cloud.com>
This allows an xs_trace.exe to be run on any linux machine, whereas
previously the use of forkexecd meant that it had to be run on a XS host.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
hcp_nss is a nss module to override the uid/gid of pooladmin
when they ssh into dom0, as dom0 only support one single user

However, the name wants to be updated to nss_override_id to
reflect its usage

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…roject#6293)

This allows an xs_trace.exe to be run on any linux machine, whereas
previously the use of forkexecd meant that it had to be run on a XS
host. It also allows the logic to be simplified (json and ndjson can be
handled in the same way).
A code clarification effort, consisting of:

- Accumulate lists of event-related objects within a record, instead of
a tuple. Using a record permits functional record update, which means we
avoid citing lists we have not changed.
- Remove `last_generation`: this mutable variable is updated in a few
places and complicates the code. Instead of relying it on being in scope
(and mutated in other places), we explicitly pass it in and then thread
the update to its value through the retry loop.
- Factor out a routine to convert `(table, obj, time)` entries into
event records, defining the accumulation of `add`, `mod`, and `del`
events in terms of it: message events remain special cased because their
contents are not in the database. This avoids duplicated code.
)

Currently, users with read-only permissions can freely manipulate their
own tasks (create, cancel, destroy, etc.). However, they cannot easily
manipulate the `other_config` field. Some keys are specified in the
datamodel as being writeable by subjects with the VM operator role.

However, RBAC checking for keys was only ever implemented for the
individual "add" and "remove" operations, not the `set_other_config`
operation. This makes sense because the typical scenario is that the
required writer role for an "other_config" field is more privileged than
each of the individually-specified key-specific roles.

In the case of tasks, the desired role for the `set_other_config`
operation is the read-only role. However, we cannot simply demote the
required "writer role" for the `other_config` field, because:

1) We must maintain key-based RBAC checks for the operation. For
example, if a read-only user attempts to modify a task's `other_config`
using `set_other_config`, the operation must fail if they've attempted
   to modify the value mapped to by some key that they are not
   privileged to use.

2) Key-based RBAC checking is special cased to `add_to` and
   `remove_from`. You can attempt to ascribe key-related role
   restrictions to arbitrary messages but these will all fail at
runtime, when invoked - because `Rbac.check` is special cased to expect
   to be able to extract out the key name from a `key` argument it
   receives, which is emitted for `add_to` and `remove_from`.

3) There is an additional restriction that read-only users should not be
   able to modify arbitrary tasks, only their own.

Both of these points require a custom implementation. To this end, we
mark `other_config` as read-only (DynamicRO) and implement custom
handlers for the `add_to`, `remove_from`, and `set` operations. In doing
so, we implement a subset of the RBAC protection logic for keys.

This custom implementation amounts to a relaxation of the usual RBAC
rules: where `add_to` and `remove_from` (both purely destructive
operations) cite a protected key (that they are not permitted to write),
RBAC fails. In the custom implementation of `set_other_config`, an
under-privileged session can cite any key so long as their change is not
destructive (it must preserve what is already there).

---

The motivation for this change is in fusing serial `add_to_other_config`
and `remove_from_other_config` operations into a single
`set_other_config` call. I've included (possibly excessive) commentary
within the code as the RBAC checking stuff is not touched a lot. This
change may not be desirable overall, as it has low impact.
In an effort to make the logic clearer, we restructure the part(s) of
Xapi_pool_helpers responsible for determining which operations are
"valid" (i.e. become members of allowed_operations).

The previous code is somewhat tedious to understand because it is
unconditionally ineffective - the logical delineation of parts of the
code is implicit, as the code computes the valid operation table in
order, but many of the operations will have no effect (as later code to
populate an operation's entry in the validity table do nothing).

To try and simplify matters, we add some level of static partitioning of
"blocking" and "waiting" operations (using separate polymorphic variants
and coercions to widen into a type comprising all operations). Then, we
replace loops with "find first" computations.

The current logic should be clearer. It is roughly as follows:
To compute the "valid" operations:

- Start by assuming all operations are valid. We explicitly map each
  operation to an "Allowed" constructor to signify this. Though,
  technically, full coverage of the cases would guarantee that absence
  from this table implies the associated operation is valid - however,
  we maintain the old concern and maintain validity entries as a
  tri-state value (Unknown, Allowed, and Disallowed).

- Determine the current operations reported by the pool object. At
  present, every operation is statically partitioned into "blocking" or
  "waiting" operations - which are both handled differently, with a
  "blocking" operation taking highest precedence

- If there's an operation in current operations that is a blocking
  operation, this takes precedence. We cover all operation cases as
  follows: (1) blocking operations get marked as being invalid and cite
  the reason associated with the blocking operation discovered in the
  current operations set (which is an operation-specific "in progress"
  error). Then, all waiting operations get marked as invalid, citing a
  generic "in progress" error (unrelated to any specific operation).

- If there's no blocking operation in current operations, but there is a
  waiting operation, we map all operations to a generic "in progress"
  error.

- If there is no blocking or waiting operation in current operations
  (which, at present, is to say that current operations is empty), then
  we invalidate entries based on specific, hardcoded, invariants. For
  example, if HA is enabled on the pool, we invalidate the `ha_enable`
  operation on the pool object (with the reason explicitly explaining
  that you HA is already enabled).

In future, we could consider encoding the relations between operations
(and related object state) declaratively, such that we can either
automatically generate such code (and test it alongside invariants,
encoded as Prolog-esque rules) or use an incremental computation
framework to automatically recompute the allowed operations based on
changes in ambient pool state.

Signed-off-by: Colin James <colin.barr@cloud.com>
* We currently have a mock implementation for driver-tool from the
  dmv-utils package. Adjust the command line argument structure to the
  real implementation to make the switch over easy.

* Install the mock implementation if the real implementation is not in
  place. We might want to remove this later. Currently xapi.spec does
  not list the dependency.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
`Remote.receive_finalize2` is called at the end of SXM to clean things
up and compose the base and leaf images together. The compose operation
should only be called while the VDI is deactivated. Currently a thread
is created to call `receive_finalize2`, which could caused problems
where the VM itself gets started while the `receive_finalize2`/`VDI.compose`
is still in progress. This is not a safe operation to do.

The fix here is to simply remove the thread and make the whole operation
sequential.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Adds a simple parser for Xapi type expressions that is used to rewrite
the types shown in the XenAPI class reference to include links to
relevant documentation.

Signed-off-by: Colin James <colin.barr@cloud.com>
Adds a simple parser for Xapi type expressions that is used to rewrite
the types shown in the XenAPI class reference to include links to
relevant documentation.

---

![image](https://github.com/user-attachments/assets/b1a515b3-6824-4ba8-a925-9989a8cc7270)

---

The parser is structured in the form of a Pratt parser. This may seem
like overkill, but it keeps the door open to extension. Also, the parser
must work with limited information (it has no knowledge of XenAPI object
names). It works by noting that object types are always (?) suffixed by
a type constructor, e.g. `VM ref` - therefore, it assumes any prefix
form must be a valid class name, then subsequent left denotations take
the left as a type parameter (for which all are unary, except `map`
which has an alternative syntax that is special cased).

Currently, the only interesting parts of the "rendered" type are:
- Enum names become links that should scroll and temporarily highlight
(flash) the relevant details.
- Object names become links to their relevant page in the documentation.

However, other structure is retained, such as where builtin (primitive)
types are (e.g. int, bool, string, etc.), constructors names (ref, set,
option, etc.). In future, these could link to relevant portions of a new
article that explains all the types (for example, the format of datetime
is perhaps non-obvious to someone reading the XenAPI reference pages).
hcp_nss is a nss module to override the uid/gid of pooladmin when they
ssh into dom0, as dom0 only support one single user

However, the name wants to be updated to nss_override_id to reflect its
usage
dnf has been updated to dnf5 during clean rebuild event for XS9,
The command line interface changed and we need to update accordingly
https://dnf5.readthedocs.io/en/latest/changes_from_dnf4.7.html

Following commands update are necessary for DNF5 update
- updateinfo -> advisory
- config-manager "--save --setopt" -> "setopt"
- reposync removed "--downloadcomps" as it is the default behavior
- repoquery format output no longer has line breaker, add `\n` in
  format argument to bring it back and align with YUM
- repoquery drop "-a" option as it is default behavior
Also improve the code with strong type checking
- Available|Updates for two kinds of updateinfo instead of string

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…ges = 0 (xapi-project#6313)

Minor improvement - more detail about `wait_xen_free_mem` (called for
waiting for scrubbing):
Mention that it terminates in case free memory is smaller than required
memory
and `physinfo.scrub_pages = 0` (will never have enough memory for this
VM).
dnf has been updated to dnf5 during clean rebuild event for XS9, The
command line interface changed and we need to update accordingly
https://dnf5.readthedocs.io/en/latest/changes_from_dnf4.7.html

Following commands update are necessary for DNF5 update
- updateinfo -> advisory
- config-manager "--save --setopt" -> "setopt"
- reposync removed "--downloadcomps" as it is the default behavior
- repoquery format output no longer has line breaker, add `\n` in format
argument to bring it back and align with YUM
- repoquery drop "-a" option as it is default behavior Also improve the
code with strong type checking
- Available|Updates for two kinds of updateinfo instead of string
* We currently have a mock implementation for driver-tool from the
dmv-utils package. Adjust the command line argument structure to the
real implementation to make the switch over easy.

* Install the mock implementation if the real implementation is not in
place. We might want to remove this later. Currently xapi.spec does not
list the dependency.
At XCP-ng, we are working on overcoming the 2TiB limitation
for VM disks while preserving essential features.
However, we currently cannot export a VDI to a Qcow2 file,
nor import one. The purpose of this design proposal is to outline
a solution for implementing VDI import/export in Qcow2 format.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
GabrielBuica and others added 24 commits March 4, 2025 10:50
Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
…ion (xapi-project#6322)

host.emergency_reenable_tls_verification can only be used after TLS
verification is enabled in pool, raise an error if otherwise.
The vm parameter passed to SMAPIv2 calls is needed for the storage
backend to know which domain slice it should operate on. For SXM, since
there is no specific VM, we use the convention "MIR<vm-prefix>", where
the <vm-prefix> is the first five characters of the hash of the vm uuid.
This is not sufficient for VMs with multiple VDIs attached to it as they
would result in the same vm parameter.

So use a new convention "MIR<vm-prefix><vdi-prefix>" so we distinguish
VDIs during migration. These vm parameters cannot be too long though as
otherwise the storage backend will not accept them.
Removes all default permissions, and only adds needed permissions to the
workflows that need it.

- read contents: allows reading the repository, most workflows need this
- write contents: allows to write to the repository. Needed to create
releases
- write pull-requests: allows to manipulate pull requests. Unfortunately
this is needed to comment on PRs
- read actions: allows reading the state of actions

Note that because all workflows remove all permissions at the top level
the current default permissions don't take any effect.
There is also a 'make analyze' to run this locally.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Macros beginning with `_` are reserved by the language.

Fixes a CodeChecker warning.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes a warning reported by codechecker

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Analyzer can't figure out that 'rc' is set on 'fd < 0',
and we never try to use 'size_in_bytes'.

Appears to be a limitation in the analyzer, it doesn't realize that `fd < 0` implies that `rc` is set to `NOT_IMPLEMENTED`,
which raises an exception, so `caml_copy_int64` is never reached.

But the logic there is quite hard to follow, so better be explicit, and ensure `size_in_bytes` is always initialized.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Add documention for the various PEM files found in a XenServer
installation.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Add documention for the various PEM files found in a XenServer
installation.
Similar to the static analyzer introduced in xha, but for OCaml C stubs.

This uses my https://ocaml.org/p/dune-compiledb/latest to product a
`compile_commands.json` for the C stubs out of the `dune` rules (which
can also be useful if you want to use `clangd` and get some editor
integration about compiler warnings). This requires installing enough of
the build dependencies to be able to run `dune` successfully (perhaps in
the future that restriction can be removed). Caching is used though, and
we only need to install the build deps when `dune` files change,
otherwise we can reuse a cached `compile_commands.json`.

Static analyzers, like CodeChecker support reading
`compile_commands.json` and invoking static analyzers with the
appropriate flags to preprocess, and analyze the C source code. We use
`clang`, `clang-tidy` and `cppcheck` as the default analyzers, although
more analyzers could be added in the future (CodeChecker supports
converting `gcc -fanalyzer` output for example. GCC also supports
emiting SARIF format directly, but github cannot parse it, because it
doesn't implement the full SARIF spec).

At the end we convert the results back to the standard SARIF format that
Github also supports for its code scanning results, which will make it
automatically add comments on PRs that introduce *new* bugs, without
necessarily gating on them.

I fixed some of the most obvious warnings, and suppressed some minor
ones that we cannot fix (where the warning is caused by a Xen or OCaml
header). More warnings can be skipped by adding to `.codechecker.json`
if needed.

So far it seems to have found a file descriptor leak in `unixpwd.c` on
an error path, but I haven't gone through all the reports in detail yet.
These functions have a more relevant name than the previous ones, especially
with the advent of OCaml 5.0 the runtime lock becomes more important and
programmers must be aware of it.

These functions were introduced in OCaml 3.12.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
…t#6341)

These functions have a more relevant name than the previous ones,
especially with the advent of OCaml 5.0 the runtime lock becomes more
important and programmers must be aware of it.

These functions were introduced in OCaml 3.12.
The runtime lock is not being held when String_val(...) is used on OCaml
values. Instead, we duplicate the strings whilst holding the lock, then
use (and free) the values without the lock.

Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Gerald Elder-Vass <gerald.elder-vass@cloud.com>
The runtime lock is not being held when `String_val(...)` is used on
OCaml values. Instead, we duplicate the strings whilst holding the lock,
then use (and free) the values without the lock.

---

Thanks to @last-genius for spotting this. It's a bad pattern used in
other places with less consequence (e.g. `Int_val` is unboxed and cannot
be relocated), but they should all be changed - I just wanted to quickly
address this in particular, since it's my mistake.
…i-project#6328)

XenServer does not protect `/etc/rsyslog.d/xenserver.conf` when updating
the rsyslog package (or the xenserver-release package on CH8.2CU1), as a
result it is replaced with a fresh instance during updates which will
lose customer configurations (if they have any).
This script is primarily invoked via XenCenter when providing a new host
for log forwarding.
An update will be made to the XenServer rsyslog package to make it clear
that `/etc/rsyslog.d/xenserver.conf` should not be edited by the
customer.

I've tested this by copying the updated script onto my DT box and using
XenCenter to add a new log server - noting the new
`/etc/rsyslog.d/custom.conf` file and the untouched
`/etc/rsyslog.d/xenserver.conf` file.
…6334)

An `if ... then raise exn;`was misread and make the code after
impossible to execute, when that was not the intention.

Remove all the ignore_<type> functions from stdext: a plain ignore with
a type annotation can replace these. We should start using those for all
ignores (there are too many of them, and can't be easily automated to do
it in this PR)

Passes internal tests: 213465 (one failure due to the recent vlan +
clustering issue)
Signed-off-by: Colin James <colin.barr@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
@changlei-li
Copy link
Contributor Author

changlei-li commented Mar 10, 2025

DCO failed, but it was introduced in master, see #6318, suggest ignoring it and merge

Commit sha: [d137934](https://github.com/xapi-project/xen-api/pull/6348/commits/d1379349efd6da61e9ed10b79ffb20956bb677fa), Author: Guillaume, Committer: Guillaume; The sign-off is missing.
Commit sha: [61c6cd5](https://github.com/xapi-project/xen-api/pull/6348/commits/61c6cd5ca8406dfea37b60d38a45f552664e661c), Author: Guillaume, Committer: Guillaume; The sign-off is missing.

@gangj
Copy link
Contributor

gangj commented Mar 10, 2025

DCO failed, but it was introduced in master, see #6318, suggest ignoring it and merge

Commit sha: [d137934](https://github.com/xapi-project/xen-api/pull/6348/commits/d1379349efd6da61e9ed10b79ffb20956bb677fa), Author: Guillaume, Committer: Guillaume; The sign-off is missing.
Commit sha: [61c6cd5](https://github.com/xapi-project/xen-api/pull/6348/commits/61c6cd5ca8406dfea37b60d38a45f552664e661c), Author: Guillaume, Committer: Guillaume; The sign-off is missing.

Yeah, I think it is ok to ignore and proceed.

@changlei-li changlei-li merged commit 7febfd2 into xapi-project:feature/guest-vm-service-aware Mar 10, 2025
17 checks passed
@changlei-li changlei-li deleted the private/changleli/merge-master-to-feature branch March 10, 2025 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.