-
Notifications
You must be signed in to change notification settings - Fork 292
[epoll] Unix.select conversion: replace with stdext modules/calls or polly calls #5705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[epoll] Unix.select conversion: replace with stdext modules/calls or polly calls #5705
Conversation
82f4969
to
aa057ea
Compare
The previous PR for this was #4877, we should check whether all the review comments from that PR got addressed. |
I've rebased this on top of latest feature perf + master that includes the new clock module: https://github.com/edwintorok/xen-api/tree/private/edvint/epoll3-pr-updated. |
aa057ea
to
7f36fcd
Compare
This PR depends on #5766, which in turn depends on the quicktest bugfix on master (I've reverted the commit in this branch, to be able to test it, but I should eventually drop that revert once we got the proper fix). |
There were a lot of changes, rebases here to keep up with the changes on master and I found one rebase error so far:
I tried modifying the tests to detect this doubling, but it doesn't yet. Putting this back to draft, also we'll need to carefully review the code, it should be simpler, but there were a lot of rebases/fixups and might've completely broken it (I haven't looked at the diffs yet, just at unit test results, and running a BVT now). |
7f36fcd
to
1b9dbdc
Compare
1b9dbdc
to
f23e3f9
Compare
Found the bug with HA: when removing the I've now added a test for the |
ocaml/libs/xapi-stdext/lib/xapi-stdext-unix/test/unixext_test.ml
Outdated
Show resolved
Hide resolved
This perfectly demonstrates the problem of having the various units for time spans and why we have to minimise this and probably need to encode the expected unit in a label for arguments. |
I've attempted to hide these conversions inside the
In general Mtime.Span.t is what we should use, and we should hide the conversions to other units needed by syscalls/library calls in small functions/modules that are tested. |
I think this PR is "ready", but I'm waiting for |
Apparently there is a problem with the timeout tests now, will investigate tomorrow:
|
531714b
to
9f5a4ef
Compare
41287a9
to
9e638c6
Compare
Fixed a few more bugs due to differences between select/epoll with a 0 timeout, and SO_RCVTIMEO with a 0 timeout. I've added some code to detect calls to setsockopt with a timeout <1e-6 and made it raise invalid_arg to catch these (similar to what I've done with the tests on master). There is one exception: when we finish a function, we "restore" the timeout to 0, so in that case the call is allowed (I inserted the check into There was in fact already a wrapper in the Http module for this, I moved it to unixext so we can reuse it. |
The PR includes commits from master, only the latest commits should be reviewed (all authored by Edwin) |
@@ -325,16 +325,20 @@ let listen_on sock = | |||
s | |||
|
|||
let accept_conn s latest_response_time = | |||
let now = Unix.gettimeofday () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit contains:
TODO: needs some extra testing of block_device_io, which doesn't many suitable tests.
Has this been done in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is 1 XenRT test capable of detecting breakages in the redolog, the one that got added for HA vTPM testing.
(all other HA XenRT tests used to pass with a completely broken redolog in the past).
There is a test for epoll on block devices part of the stdext quickcheck tests, although it requires root so it is not run in the CI.
I'll add some code to the XAPI quicktests so we can run some of these fuzzers in Dom0.
I think it is still worthwhile trying to add a separate test for the redolog itself that runs as part of the quicktests (it needs a blockdevice, and setting one up even in loopback mode requires root, so can't be done in the CI).
Test.fail_reportf "Timed out earlier than requested: %f < %f" | ||
elapsed_s timeout ; | ||
let elapsed = !test_elapsed in | ||
if Mtime.Span.compare elapsed timeout < 0 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Mtime.Span.compare elapsed timeout < 0 then | |
if Clock.Timer.span_is_shorter elapsed ~than:timeout then |
if not (wait remaining_time) then raise Timeout ; | ||
let bytes_to_read = total_bytes_to_read - !bytes_read in | ||
let bytes = | ||
try Unix.read filedesc buf !bytes_read bytes_to_read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems almost all duplicated, only the function called changes (write / read), can this be factored out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xapi-fdcaps has a better factored out implementation (there is a repeat_read
and repeat_write
, although much of the code of those 2 functions could be shared too, and there is no Timer version, but one could be easily introduced).
Xapi-fdcap is what I use to test unixext, so if we switch unixext over to xapi-fdcaps directly, then it would be testing itself with itself (although the way timeout testing is done is independent of how it is implemented, so the test would still be a useful one).
Unixext could be changed to use xapi-fdcaps, although the type changes will probably ripple through the codebase. It might result in an overall safer code, because we'd know from its type whether a file descriptor is a socket or not for example (and then whether calling setsockopt is a valid call or not).
I'll check whether switching over directly to fdcaps would result in simpler code, if not I'll factor out the code here in stdext.
I should do a separate update from master PR again to make this easier to review. #5853 |
Use SO_RCVTIMEO socket option instead. This will cause EAGAIN or EWOULDBLOCK to be returned according to `socket(7)`. Need to be careful about the distinction between select 0 and SO_RCVTIMEO 0. select 0 would timeout immediately, but SO_RCVTIMEO would timeout never. Signed-off-by: Steven Woods <steven.woods@citrix.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This introduces a dependency between 'ezxenstore' and 'xapi-stdext-threads'. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…Unix/Lwt It was a mix of Lwt and Unix code, which means that if the Unix call blocks the entire Lwt code blocks too. This was only installed by the specfile in a -devel package. `message-cli tail --follow` can be used to debug the IDL protocol instead. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Its implementation was identical, except for the use of time_limited_read in Threadext, but the semantics is identical. Use one well tested implementation instead of duplicating code. One less function to convert to epoll. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Absolute deadlines can be represented using `Mtime.t`, or `Mtime_clock.counter` and comparing against a deadline. We hide this as an implementation detail in `Timer.t` and use counters (although an implementation using `Mtime.t` would work just as well). Relative deadlines are represented using `Mtime.Span.t`. We wrap it in `Timeout.t` to avoid accidentally mistaking it for a time duration calculated elsewhere. These helper modules correctly convert from time represented as strings (e.g. in configuration files), and calculate the remaining time until the deadline. `Mtime.Span.t` expects `float`s as nanoseconds, which can be quite error prone when converting existing code if the conversion factor is forgotten. The various `time_limited*` functions will be changed to take a `Timer.t` as argument in a followup commit. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Instead of Unix.gettimeofday. This is required to start using these functions in Jsonrpclient, which already uses Mtime. Mtime should also be more robust against clock changes on the host. TODO: needs some extra testing of block_device_io, which doesn't many suitable tests. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Steven Woods <steven.woods@citrix.com>
Note: Unixext.proxy is more efficient and doesn't recreate the epoll instance many times. But xsh is not performance critical. Signed-off-by: Steven Woods <steven.woods@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…t_timed_read Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…Unix.select Signed-off-by: Edwin Török <edwin.torok@cloud.com>
If we want to reproduce a failure we need to know the exact commandline that was used. Longer than 80 chars is not a problem, this is a logfile, and a truncated line is worse than a long line. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
9e638c6
to
ef13b44
Compare
I'm going to take a different approach with epoll, see master...edwintorok:xen-api:private/edvint/epoll4. |
Builds on top of #5704