-
Notifications
You must be signed in to change notification settings - Fork 292
CP-48676: Reuse pool sessions on slave logins #5986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-48676: Reuse pool sessions on slave logins #5986
Conversation
ocaml/xapi/xapi_session.ml
Outdated
in | ||
new_session_id | ||
in | ||
match (originator, pool, is_local_superuser, uname) with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of match
seems questionable as no destruction of the tuple is used and we still have when
and if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point thanks, I think originally match
made sense but I refactored it to the point it no longer does.
d76a68f
to
b52c92f
Compare
ocaml/xapi/xapi_session.ml
Outdated
else | ||
let new_session_id = create_session () in | ||
with_lock reusable_pool_session_lock (fun () -> | ||
reusable_pool_session := new_session_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a lock for writing we also need to use a lock for reading, otherwise this is not different from just having the assignment directly (at least on OCaml 4), and there is a race where a session would be created and potentially leaked on a race.
Usually the lock needs to be placed around the whole read-modify-write cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I do wonder if the performance will be worse though if the threads have to wait for each other to get a session?
b52c92f
to
510b17d
Compare
ocaml/xapi/xapi_session.ml
Outdated
(try Db.Session.destroy ~__context ~self with _ -> ()) ; | ||
Rbac.destroy_session_permissions_tbl ~session_id:self | ||
with_lock reusable_pool_session_lock (fun () -> | ||
if self <> !reusable_pool_session then ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume we also need to protect this read with the lock @edwintorok ? Otherwise one thread could assign to reusable_pool_session just after this is read, and then the session would be destroyed?
You are referring to the following code in login_no_password_common:
But does anyone understand why this is needed at all? That is, in what way in the time "incorrect" and how does the API call make it correct? The session is created in the DB in this function with I guess issues may occur if the pool member doing the login and the coordinator do not have the same time on their clocks. But if there is the case I bet lots of stuff may go wrong... I found the original ticket now that caused this code to be added. It was due to a "clock skew" of 9 (!) days between the two hosts, which meant that the session got deleted by the GC as it was "not used" for 24 hours :) |
510b17d
to
f732223
Compare
@robhoes I did some testing with removing the pool.get_all when creating a session and it was (obviously) faster than master (by about 3 seconds) but using reusable session was still 1s faster than even that. So it depends if the 9 day clock skew is likely to reappear or not, we could remove the pool.get_all? But I think reusable sessions is still worth it in either case. Removing pool.get_all from session creation would still be worth doing if possible, as it would work in all the occasions where we can't reuse the session i.e. when it's not an internal_xapi_master_to_xapi_slave_login |
Ring3: BVT + BST suiterun 205345 was all green with the latest changes (it looks like moving the lock to the appropriate places fixed the last few errors) |
I've run two pool bootstorm jobs, one for toolstack-next and one for my build tag with these changes, job ids 4116362 and 4116900 respectively. If you search the jaeger kfd endpoint for these job ids, you can see the speedups provided e.g. Xapi_session.login_no_password_common taking 2.79ms on toolstack-next and 0.14ms on my build tag. |
f732223
to
5779b35
Compare
Rebased on master to fix merge conflict |
ocaml/xapi/xapi_session.ml
Outdated
~rbac_permissions ~db_ref ~client_certificate = | ||
Context.with_tracing ~originator ~__context __FUNCTION__ @@ fun __context -> | ||
let internal_xapi_master_to_xapi_slave_login = | ||
(xapi_internal_originator, true, true, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just put this tuple into the if-statement below, as on it's own these values are meaningless (what Booleans are these etc.).
if (originator, pool, is_local_superuser, uname) =
(xapi_internal_originator, true, true, None)
makes this clearer.
ocaml/xapi/xapi_session.ml
Outdated
Rbac_audit.session_destroy ~__context ~session_id:self ; | ||
(try Db.Session.destroy ~__context ~self with _ -> ()) ; | ||
Rbac.destroy_session_permissions_tbl ~session_id:self | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log the else-case here, so that it is clear why there is no Session.destroy
in the logs as part of a session.logout
call.
Prevent this reusable pool session from being destroyed so that it remains valid. This feature can be toggled with the flag reuse-pool-sessions. Signed-off-by: Steven Woods <steven.woods@citrix.com>
Add a new flag validate-reusable-pool-session to xapi globs which skips the reusable pool session validity check if it is false. This saves time as we are no longer calling pool.get_all for each session. Signed-off-by: Steven Woods <steven.woods@citrix.com>
5779b35
to
c27b1d4
Compare
This was first introduced in #5986 but was later reverted as the mutex it introduced caused a deadlock with the DB mutex. This PR has now been reworked to use an Atomic variable, removing the need for the mutex. This PR allows pool sessions to be reused (if the flag is turned on, which it is by default). This greatly speeds up communications between hosts which is important for repetitive calls e.g. one customer was running get_servertime frequently on all hosts in a pool which was causing problems due to the length of each call. When we get this reusable session, we can optionally validate the session before using it but this increases the duration of the call, so this is off by default. Excepting a toolstack restart, the reusable session should always be valid as it is excluded from deletion in destroy_db_session
This PR allows pool sessions to be reused, rather than wasting time creating new ones each time to talk to the slaves.
The average time spent across 15 slave get_servertime xenapi calls on a virtual pool was 20.97ms with reuse-pool-sessions = false.


Compared to an average of 17.11ms with reuse-pool-sessions = true
In particular, the part of the request that has actually been changed, the login_no_password call has gone from about 2.41ms to 0.008ms, 300x faster than previously.
One part of why this speedup is possible is because we no longer have to do a pool.get_all for each session (to force the newly created session's time to update). If we were to validate the reusable pool session is valid before use, we would still need to do this pool.get_all. However, as we are now no longer destroying the db session after use, it should continue to be valid. As such, to achieve the best speed increase, this option of validating the reusable pool session (validate-reusable-pool-session in xapi.conf) is off by default.