-
Notifications
You must be signed in to change notification settings - Fork 292
CA-392887: set_tls_config
immediately after enabling clustering
#5644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-392887: set_tls_config
immediately after enabling clustering
#5644
Conversation
6909565
to
68e4040
Compare
ocaml/xapi/xapi_cluster_host.ml
Outdated
@@ -239,15 +239,15 @@ let resync_host ~__context ~host = | |||
(Ref.string_of self) ; | |||
Xapi_clustering.Daemon.enable ~__context ; | |||
maybe_switch_cluster_stack_version ~__context ~self ~cluster_stack ; | |||
let verify = Stunnel_client.get_verify_by_default () in | |||
set_tls_config ~__context ~self ~verify ; | |||
(* Note that join_internal and enable both use the clustering lock *) | |||
Client.Client.Cluster_host.enable ~rpc ~session_id ~self | |||
) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an implicit else
branch here for the "joined and not enabled" case, which now excludes set_tls_config
as well. This is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good point. If the cluster daemon is not enabled, then in theory xapi-clusterd service itself should not be running, hence any requests to xapi-clusterd should fail. That means not only should we do the set_tls_config
inside enabled case, but the observer initialisation should be moved into the enabled path as well. Need to verify that....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is no need to do anything if xapi-clusterd is not enabled, I have resturctured the code for this
68e4040
to
3e8cd41
Compare
requesting a review from @lindig since he originally wrote this code to make sure this refactor is valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is valid. When we wrote TLS certification code we did not think carefully enough about the sequence.
xrt tests pending, let's see wait for the tests before merging this |
@Vincent-lau how are the tests? Can we merge this? |
Let's draft this for now as it might take some time to triage the xrt issues |
3e8cd41
to
a869550
Compare
Previously xapi calls `set_tls_config` regardless of whether a host has joined or enabled, which will restart the remote server of xapi-clusterd. In the meantime, another xapi-clusterd might also be joining, which causes `distribute_state` to be called while the remote server is restarting. Now remove the `set_tls_config`, this is because `join_internal` already creates a tls_config and passes it to xapi-clusterd, but xapi-clusterd does not store that tls_config in its db, it just starts the http server with that tls config. Modifying xapi-clusterd to store that config will be done in a separate PR. Moreover, `cluster_host.enable` also calls `set_tls_config`, which means there is no need to call `set_tls_config` if the cluster host is joined but not enabled. Also move the observer and watcher creation into the not joined case, since cluster_host.enable already calls them and there is no need to call them if the host is not enabled. This does not, however, solve the whole problem. For that, we need to make sure that `distribute_state` and `set_tls_config` cannot happen at the same time. More generally, any remote calls cannot happen while `tls_config` is running. Hence we need them to hold the same lock. This will be done in xapi-clusterd. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
a869550
to
5975562
Compare
A few small refactoring to fix bugs in xrt, should be ready to go now, accompanied by a xapi-clusterd PR |
Previously xapi calls
set_tls_config
after the join call, which will restart the remote server of xapi-clusterd. In the meantime, another xapi-clusterd might also be joining, which causesdistribute_state
to be called while the remote server is restarting.Now move the
set_tls_config
immediately after the cluster daemon is enabled to reduce the chance of race. Moreover,join_internal
already creates a tls_config and passes it to xapi-clusterd, there is no point resetting the tls config immediately afterwards.This does not, however, solve the whole problem. For that, we need to make sure that
distribute_state
andset_tls_config
cannot happen at the same time. More generally, any remote calls cannot happen whiletls_config
is running. Hence we need them to hold the same lock. This will be done in xapi-clusterd.