-
Notifications
You must be signed in to change notification settings - Fork 292
sync master to feature/guest-vm-service-aware #6383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync master to feature/guest-vm-service-aware #6383
Conversation
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Forkexecd was written to avoid some issues with Ocaml and multi-threading. Instead use C code to launch processes and avoid these issues. Interface remains unchanged from Ocaml side but implementation rely entirely on C code. vfork() is used to avoid performance memory issue. Reap of the processes are done directly. Code automatically reap child processes to avoid zombies. One small helper is used to better separate Ocaml and C code and handling syslog redirection. This allows to better debug in case of issues. Syslog handling is done in a separate process allowing to restart the toolstack and keep launched programs running; note that even with forkexecd daemon one process was used for this purpose. Code tries to keep compatibility with forkexecd, in particular: - SIGPIPE is ignored in the parent; - /dev/null is open with O_WRONLY even for stdin; - file descriptors are limited to 1024. We use close_range (if available) to reduce system calls to close file descriptors. Cgroup is set to avoid systemd closing processes on toolstack restart. There's a fuzzer program to check file remapping algorithm; for this reason the algorithm is in a separate file. To turn internal debug on you need to set FORKEXECD_DEBUG_LOGS C preprocessor macro to 1. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Resolved onflicts: ocaml/idl/datamodel_errors.ml ocaml/xapi-consts/api_errors.ml Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Resolved onflicts: ocaml/idl/datamodel_errors.ml ocaml/xapi-consts/api_errors.ml
These changes append the `baggage` of the `trace_context` to the list of environment variables that are about to be submitted to SM scripts by `forkexec`. This should improve the propagation of `trace_context` along traces that use multiple components. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
This links requests back into `xapi` from `smapi` to the xapi request that called the SM scripts in the first place. When `smapi` was experimental there would be no traceparent header passed in the request from `smapi` to `xapi`. Now, when `smapi` is experimental and the tracing is enabled for xapi, the traceparent passed to `smapi` is passed back to `xapi` for each request. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Proposing a new field field to get information about supported image formats for a given SR. This information is retrieved from SMAPIv1 plugins. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
When a the baggage is passed as an environment variable, it is now added to the resource attributes inside of `observer.py`. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Just as the traceparent is passed back as an additional header, now the baggage is passed as well in the http request coming from `smapi` back into `xapi`. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
init.d are from initscripts which are legacy and removed from XS9 This commit drop the usage of the scripts Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…opsd Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
query xapi db, then fallback to query domain DC get_subject_information_from_identifier query subject details from subject id. It triggers some DNS query to do kerberos query. The subject details are actually cached in xapi db and refreshed default in every 10 minutes. get_subject_information_from_identifier should query subject details from xapi DB and only fallback to DC when xapi DB does not have it. Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…api-project#6344) query xapi db, then fallback to query domain DC get_subject_information_from_identifier query subject details from subject id. It triggers some DNS query to do kerberos query, this causes the problem that authenticating to XAPI with an AD account causes large amounts of Kerberos / DNS traffic The subject details are actually cached in xapi db and refreshed default in every 10 minutes. get_subject_information_from_identifier should query subject details from xapi DB and only fallback to DC when xapi DB does not have it.
Rrd.ds_create has optional min and max arguments (defaulting to neg_infinity and infinity respectively). Several callers would omit these parameters, resulting in ds_min and ds_max being lost during the conversion from Ds.ds to Rrd.ds. Without these, metrics couldn't be kept in range, which would result in some (such as CPU usage numbers) going negative when a domain would change its domid (over a reboot), for example. Make these parameters (alobg with mrhb) required, not optional. Requires adjusting unit tests as well. This latent behaviour was exposed during the major timestamp and plugin refactoring last year. Previously, the entire RRD was created at once by calling create_fresh_rrd. Now create_fresh_rrd is only called for the first chunk, and other chunks of the RRD call merge_new_dss, which omitted the optional arguments. Rrdd_server.add_ds also ommitted these arguments, which meant that datasources enabled at runtime would not be kept in range. Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
…project#6349) Rrd.ds_create has optional min and max arguments (defaulting to neg_infinity and infinity respectively). Several callers would omit these parameters, resulting in ds_min and ds_max being lost during the conversion from Ds.ds to Rrd.ds. Without these, metrics couldn't be kept in range, which would result in some (such as CPU usage numbers) going negative when a domain would change its domid (over a reboot), for example. Make these parameters required, not optional. Requires adjusting unit tests as well. This latent behaviour was exposed during the major timestamp and plugin refactoring last year. Previously, the entire RRD was created at once by calling create_fresh_rrd. Now create_fresh_rrd is only called for the first chunk, and other chunks of the RRD call merge_new_dss, which omitted the optional arguments. Rrdd_server.add_ds also ommitted these arguments, which meant that datasources enabled at runtime would not be kept in range.
Change Ocaml version of the example in readme to `4.14.2` as the same as the version in `xs-opam-ci.env`. Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Change Ocaml version of the example in readme to `4.14.2` as the same as the version in `xs-opam-ci.env`.
We are stopping the management server at the end of the API call. This avoids clients connecting to this host before it has rebooted and become a pool master. Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Proposing a new field field to get information about supported image formats for a given SR. This information is retrieved from SMAPIv1 plugins.
…pi-project#6343) (doc) Describe how xc_domain_claim_pages() is used to claim pages
…opsd (xapi-project#6335) Describe `xc_vcpu_setaffinity()` and document its use by `xenguest` and `xenopsd`. This PR is my third iteration of documenting how `xenopsd` and `xenguest` interact, configure, and set the vCPU affinity. With the clarifications from @edwintorok, I now have confidence that I finally got this right.
We are stopping the management server at the end of the API call. This avoids clients connecting to this host before it has rebooted and become a pool master. Tested manually; running a BST now.
Use a simple syntax so this is easier to preview for local environment. No functional change. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Change all the link to the latest xapi version v25.11.0 Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
This reverts commit 9b1b8d4. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Xenopsd has an experimental feature that aims to optimise NUMA placement. This used to be configured by adding `numa-placement=true` to the file /etc/xenopsd.conf, which tells xenopsd to enable the feature. Later, an actual API was added to configure this: `host.set_numa_affinity_policy`. The expectation was that, while this new API should be preferred, the old xenopsd.conf option would still work for backwards compatibility reasons. This is particularly important for hosts that are upgraded to the new version. Unfortunately, while code exists in xenopsd to read and use the config option when it starts up, when xapi starts up immediately after xenopsd, it always overrides the NUMA config based its own DB field. The field type actually has a "default" option, but this gets translated to "any" (= no NUMA). By default, this means means that the experimental feature is disabled, no matter what the config file says, and can only be enabled through the API. The fix is for xapi to not assign a default value itself, but let xenopsd decide on the default policy. And xenopsd uses its config file to do so, as before. Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
In Xapi_event, events are accumulated by folding over the set of tables associated with a subscriber's subscription record. These events are mostly accumulated as lists within a tuple. There is no analogue of functional record update for tuples in OCaml. This means that the separate accumulations have to cite values they will not update. By introducing records, we can only cite the fields we actually change. Signed-off-by: Colin James <colin.barr@cloud.com>
In event accumulation for event.from, the code uses a mutable variable to thread a value through event accumulation. However, this value itself is accumulated in the fold: it gets larger for each matching database event that matches a subscription. To avoid the complexity in effectively having a global, mutable variable, we drop it and make it more evident when it changes: it is changed when no events are accumulated (by grab_range). In the case that no events are accumulated, but the deadline hasn't been reached, the code tries to collect events again. It is during a retry that the last_generation needs to be bumped, as it defines the starting point by which to query the database for recent and deleted objects. In short, if no suitable events were gleaned from matching database object records since a given point, there's no point starting from there again. Signed-off-by: Colin James <colin.barr@cloud.com>
…ect#6368) Xenopsd has an experimental feature that aims to optimise NUMA placement. This used to be configured by adding `numa-placement=true` to the file /etc/xenopsd.conf, which tells xenopsd to enable the feature. Later, an actual API was added to configure this: `host.set_numa_affinity_policy`. The expectation was that, while this new API should be preferred, the old xenopsd.conf option would still work for backwards compatibility reasons. This is particularly important for hosts that are upgraded to the new version. Unfortunately, while code exists in xenopsd to read and use the config option when it starts up, when xapi starts up immediately after xenopsd, it always overrides the NUMA config based its own DB field. The field type actually has a "default" option, but this gets translated to "any" (= no NUMA). By default, this means means that the experimental feature is disabled, no matter what the config file says, and can only be enabled through the API. The fix is for xapi to not assign a default value itself, but let xenopsd decide on the default policy. And xenopsd uses its config file to do so, as before.
In order to provide event records to subscribers, we must convert the accumulated events of the form (table, objref, time) into event records. The process of doing this is simple for objects in the database. The only difference is that deletion events do not provide a snapshot (as the object has been deleted). To avoid repeating ourselves, we define an "events_of" function that accumulates event records. The function takes an argument that specifies whether an attempt to provide a snapshot should be performed. The reification of events associated with messages - which are not stored in the database - is untouched. This relies on a callback instated elsewhere. Signed-off-by: Colin James <colin.barr@cloud.com>
Further changes to turn tuples into records. Also partially uncurries `collect_events` to make its intended use as a fold more apparent. Signed-off-by: Colin James <colin.barr@cloud.com>
Int_val truncates values to a 32-bit int. Instead use Long_val, which does not suffer from this. This is a problem when claiming more than ≈ 9706GiBs for a domain. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
…xapi-project#6372) Int_val truncates values to a 32-bit int. Instead use Long_val, which does not suffer from this. This is a problem when claiming more than ≈ 9706GiBs for a domain. Found after reading this PR ocaml/ocaml#13852
The default crypto policy in XS9 disables use of SHA1. However, swtpm needs to use it since it advertises SHA1 support to guests. On XS9, swtpm will ship with a custom openssl configuration file for this purpose so set the appropriate environment variable to use it if the file exists. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
This makes the binary unusable for testing in xapi_forkexecd Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This makes them compilable in xs-opam Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
- forkexecd: do not tie vfork_helper to the forkexec package - opam: add missing dependencies to packages
For some reason the SR_CACHING feature was never defined in xapi.
Same as xapi-project#6306 but retaining the suppression of potential snapshot-related exceptions. Now, there is commentary, a citation of the internal CA ticket (from 2012), and a commit hash citing the introduction of the `try`-`with`. --- Clean BVT+BST (214804) and works as expected under the internal error reproduction script. These commits were cherry picked back to the top, but I am happy to squash them.
The default crypto policy in XS9 disables use of SHA1. However, swtpm needs to use it since it advertises SHA1 support to guests. On XS9, swtpm will ship with a custom openssl configuration file for this purpose so set the appropriate environment variable to use it if the file exists.
It got changed by xapi-project/sm@8499250 Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Xapi uses stunnel to connect to remote peer and exposes certificate verify error by parsing stunnel logs. And when connect with a mismatched certificate, the log from stunnel would be: stunnel 5.60 on x86_64-koji-linux-gnu platform Compiled/running with OpenSSL 3.0.9 30 May 2023 Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,OCSP,SNI Auth:LIBWRAP Reading configuration from descriptor 8 UTF-8 byte order mark not detected FIPS mode disabled Configuration successful Service [stunnel] accepted connection from unnamed socket s_connect: connected 10.63.96.116:443 Service [stunnel] connected remote server from 10.63.97.76:34138 CERT: Pre-verification error: self-signed certificate Rejected by CERT at depth=0: CN=10.63.96.116 SSL_connect: ssl/statem/statem_clnt.c:1889: error:0A000086:SSL routines::certificate verify failed Connection reset: 0 byte(s) sent to TLS, 0 byte(s) sent to socket This commit fixes the exposing of Stunnel_verify_error by checking "certificate verify failed" in the log, and expose it with reason "0A000086:SSL routines::certificate verify failed". We can find that the log "VERIFY ERROR" is not print by stunnel 5.60, which is the version of stunnel used in XS now, but it indeed was printed before: 20d6d2faf740ee5eb9b13752b076ee583fec94d8:src/verify.c: s_log(LOG_WARNING, "VERIFY ERROR: depth=%d, error=%s: %s", [gangj@xenrt10715872 stunnel]$ git branch --contains 20d6d2faf740ee5eb9b13752b076ee583fec94d8 master * private/gangj/stunnel-5.60 While we can find the log "certificate verify failed" which comes from openssl library: https://github.com/openssl/openssl/blob/openssl-3.0.9/ssl/ssl_err.c {ERR_PACK(ERR_LIB_SSL, 0, SSL_R_CERTIFICATE_VERIFY_FAILED), "certificate verify failed"}, Signed-off-by: Gang Ji <gang.ji@cloud.com>
Xapi uses stunnel to connect to remote peer and exposes certificate verify error by parsing stunnel logs. And when connecting with a corrupted certificate, the log from stunnel would be: Initializing inetd mode configuration Clients allowed=500 stunnel 5.60 on x86_64-koji-linux-gnu platform Compiled/running with OpenSSL 3.0.9 30 May 2023 Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,OCSP,SNI Auth:LIBWRAP errno: (*__errno_location ()) Initializing inetd mode configuration Reading configuration from descriptor 8 UTF-8 byte order mark not detected FIPS mode disabled No PRNG seeding was required stunnel default security level set: 2 Ciphers: ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256 TLSv1.3 ciphersuites: TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256:TLS_CHACHA20_POLY1305_SHA256 TLS options: 0x02100000 (+0x00000000, -0x00000000) Session resumption enabled No certificate or private key specified error queue: crypto/x509/by_file.c:234: error:05880009:x509 certificate routines::PEM lib error queue: crypto/pem/pem_info.c:169: error:0488000D:PEM routines::ASN1 lib error queue: crypto/asn1/tasn_dec.c:349: error:0688010A:asn1 encoding routines::nested asn1 error error queue: crypto/asn1/tasn_dec.c:1178: error:06800066:asn1 encoding routines::bad object header SSL_CTX_load_verify_locations: crypto/asn1/asn1_lib.c:95: error:0680009B:asn1 encoding routines::too long Inetd mode: Failed to initialize TLS context Configuration failed Deallocating temporary section defaults This commit exposes Stunnel_verify_error by checking "No certificate or private key specified" in the log, and expose it with reason "The specified certificate is corrupt". And the log "No certificate or private key specified" comes from stunnel: https://github.com/mtrojnar/stunnel/blob/9f291d5ba27f0fa45353ae87cf9ac5f05401b012/src/ctx.c#L690 /* load the certificate and private key */ if(!section->cert || !section->key) { s_log(LOG_DEBUG, "No certificate or private key specified"); return 0; /* OK */ } Signed-off-by: Gang Ji <gang.ji@cloud.com>
There are 4 error logs are checked in check_error: "Connection refused" "No host resolved" "No route to host" "Invalid argument" We can indeed find the logging in stunnel for 2 of them in stunnel 5.60, which is the version used in XS now: [gangj@xenrt10715872 stunnel]$ git grep -C 1 -wn "Connection refused" src/log.c-493- case 10061: src/log.c:494: return "Connection refused (WSAECONNREFUSED)"; src/log.c-495- case 10062: -- src/protocol.c-240- s_log(LOG_ERR, src/protocol.c:241: "SOCKS5 request failed: Connection refused"); src/protocol.c-242- break; [gangj@xenrt10715872 stunnel]$ [gangj@xenrt10715872 stunnel]$ git grep -C 1 -wn "Invalid argument" src/log.c-437- case 10022: src/log.c:438: return "Invalid argument (WSAEINVAL)"; src/log.c-439- case 10024: While the other 2 are not found: [gangj@xenrt10715872 stunnel]$ git grep -C 1 -wn "No host resolved" [gangj@xenrt10715872 stunnel]$ [gangj@xenrt10715872 stunnel]$ git grep -C 1 -wn "No route to host" [gangj@xenrt10715872 stunnel]$ But seems "No host resolved" was in the history of stunnel: ddef8f192ecfe195610000c6f6272f6b77b97e53:src/client.c: s_log(LOG_ERR, "No host resolved"); [gangj@xenrt10715872 stunnel]$ git branch --contains ddef8f192ecfe195610000c6f6272f6b77b97e53 master * private/gangj/stunnel-5.60 And I failed to find the log "No route to host" in any historical code of stunnel or openssl. So at least for the two errors "No host resolved" and "No route to host", I think we will need to test and fix them later. Signed-off-by: Gang Ji <gang.ji@cloud.com>
Signed-off-by: Gang Ji <gang.ji@cloud.com>
…rtificate, and expose ssl_verify_error during update syncing (xapi-project#6376) Pls refer to commit msg for details
|
||
// fork | ||
err = 0; | ||
res->pid = vfork(); |
Check warning
Code scanning / report-converter
Call to function 'vfork' is insecure as it can lead to denial of service situations in the parent process. Replace calls to vfork with calls to the safer 'posix_spawn' function Warning
&& errno == EINTR) | ||
continue; | ||
close_fd(&pipe_fds[0]); | ||
if (readed != 0 && readed < offsetof(msg_t, msg_buf) + 1) { |
Check warning
Code scanning / report-converter
comparison of integers of different signs: 'int' and 'unsigned long' Warning
int err; | ||
|
||
// compute deadline | ||
timeout_kill tm = { pid, false, false }; |
Check warning
Code scanning / report-converter
missing field 'deadline' initializer Warning
return -errno; | ||
|
||
double f = floor(timeout); | ||
tm.deadline.tv_sec += f; |
Check warning
Code scanning / report-converter
implicit conversion turns floating-point number into integer: 'double' to '__time_t' (aka 'long') Warning
|
||
double f = floor(timeout); | ||
tm.deadline.tv_sec += f; | ||
tm.deadline.tv_nsec += (timeout - f) * 1000000000.; |
Check warning
Code scanning / report-converter
implicit conversion turns floating-point number into integer: 'double' to '__syscall_slong_t' (aka 'long') Warning
For those code scan warnings, I think they are introduced by the code merge from master? |
It is introduced by #5995 |
5e3a46e
into
xapi-project:feature/guest-vm-service-aware
No conflict during merge. The code scanning warnings are introduced by #5995 , not relevant to this feature.