-
Notifications
You must be signed in to change notification settings - Fork 292
Refactor SXM for SMAPIv1 code to fit the new code structure #6423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SXM for SMAPIv1 code to fit the new code structure #6423
Conversation
58aae78
to
45e0df2
Compare
Currently error handling was done by building up a list of cleanup functions in | ||
the `on_fail` list ref as the function executes. For example, if the `receive_start` | ||
has been completed successfully, add `receive_cancel` to the list of cleanup functions. | ||
And whenever an exception is encountered, just execute whatever has been added | ||
to the `on_fail` list ref. This is convenient, but does entangle all the error | ||
handling logic with the core SXM logic itself, making the code rather than hard | ||
to understand and maintain. | ||
|
||
The idea to fix this is to introduce explicit "stages" during the SXM and define | ||
explicitly what error handling should be done if it fails at a certain stage. This | ||
helps separate the error handling logic into the `with` part of a `try with` block, | ||
which is where they are supposed to be. Since we need to accommodate the existing | ||
SMAPIv1 migration (which has more stages than SMAPIv3), the following stages are | ||
introduced: preparation (v1,v3), snapshot(v1), mirror(v1, v3), copy(v1). Note that | ||
each stage also roughly corresponds to a helper function that is called within `MIRROR.start`, | ||
which is the wrapper function that initiates storage migration. And each helper | ||
functions themselves would also have error handling logic within themselves as | ||
needed (e.g. see `Storage_smapiv1_migrate.receive_start) to deal with exceptions | ||
that happen within each helper functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this helps reviewing the changes (thanks!), I think there's no need to document how the old system works after the change is done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was trying to document how the new error handling is done, but then I thought it would be difficult to understand the error handling at different stages in the SMAPIv1 SXM without understanding how they are splitting into multiple stages... So I wrote it anyway
I think Rob should review this, otherwise it looks like a very positive change |
45e0df2
to
6596419
Compare
6596419
to
615de98
Compare
We define different failures for SXM, for different stages of the SXM: - preparation failure: raised when preparing the environment for SXM, for example, when the dest host creates VDIs for data mirroring (SMAPIv1 and v3); - mirror_fd_failure: happens when passing fds to tapdisks for mirroring (SMAPIv1 only); - mirror_snapshot_failure: raised when taking a snapshot as the base image before copying it over to the destination (SMAPIv1 only); - mirror_copy_failure: raised when copying of the base image fails; - mirror_failure: raised when there is any issues that causes the mirror to crash during SXM (SMAPIv1 and v3) Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Although copying is not exclusively used by SMAPIv1, e.g. DATA.copy maps to Storage_migrate.copy which maps to Storage_smapiv1_migrate.copy. And this is used when a VDI does not need to be migrated live (attached read-only). It's a bit easier in terms of dependency to leave them in storage_smapiv1_migrate.ml And some related helper functions such as `progress_callback`. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Because the latter needs to use the former. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
The original Storage_migrate.start function is quite fat and does lots of things. For better readability and maintainability, split it into multiple functions that represent different stages of the SXM: - prepare: makes a remote call to the dest host and asks it to prepare for the VDIs for receiving data - mirror_pass_fds: passes the fd to tapdisk to establish connection between two tapdisk processes (sending & receiving) - mirror_snapshot: takes a snapshot of the VDI to be mirrored, because the tapdisk mirroring only mirrors data written after the mirror is initiated - mirror_copy: copy the snapshot from the source to destination As these operations are all specific to SMAPIv1, move them to `storage_smapiv1_migrate` as well. Also restructure the way clean up is done. As there are multiple points at which the migration might fail, previously a `on_fail` list ref is constructed and populated as we go. Now that we have explicitly defined the different errors during the migration, we could handle these errors accordingly. The way to reason about cleanups is: we have three possible clean up actions: - receive_cancel: which will undo the preparation work done on the dest host. This is added once preparation is done; - destroy snapshot: which will destroy the snapshot. This is called after the snapshot is created; - stop: which will disable mirroring, delete snapshot and also do what receive_cancel does, which is basically clean up everything. And this will be called after the mirror has been started. Note many of these functions are best-effort, for example, the stop function won't delete a snapshot if it does not exist, so it's generally safe to call them. We put these clean up functions into the appropriate places, see the actual arrangement of these functions in the code, they should be self-explanatory. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Now that the split is done, we can implement the `send_start` function defined earlier for SMAPIv1, by combining all the different stages into one function and invoke it from `Storage_migrate.start`. At this point the refacotring for SMAPIv1 should be finished, and there should still be no functional change. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Continuing on #6423 This PR changes the way `receive_start2` is called, and it can now multiplex based on the source SR and destination SR type. Note that `receive_start2` is introduced for inbound SXM for SMAPIv3, which is flagged off by default, so no need to worry about backwards compatability. Also introduces a couple of new types for SMAPIv3 migration, but they are uncurrently unused. There should still be no functional change. More to come...
This is a continuation of #6404, which completes the refactoring of SXM code for the new architecture.
I expect there to be no functional change, although there is a significant change of how error handling is done.
The last couple of commits contain the design for this PR.
To be continued...