Skip to content

CA-401242: avoid long-running, idle connections on VDI.pool_migrate #6102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

psafont
Copy link
Member

@psafont psafont commented Nov 5, 2024

When a VDI.pool_migrate starts at a pool member, a connection between the
coordinator and that host remains open for the duration of the migration. This
connection is completely idle. If it's open for 12 hours, stunnel closes the
connection due to inactivity, which cancels the migration.

To avoid this use an internal API that uses short-running connection whenever
possible to avoid interrupting the migration.

Also change nested if-else flow in vdi handling to use Result with let binds, and add missing CDI operations to storage operations.

I've manually verified the fix by changing the stunnel timeout in the coordinator of a pool to 5 seconds and migrating a vdi used by a VM in another host:

# xe vdi-pool-migrate uuid=fdb8783f-8cdb-44d3-9327-992f047e6262 sr-uuid=c879ffc9-bb14-fec5-3d95-8959db9337eb
270c3dc8-27bc-4ef2-982a-a6142f0ffce1

And after reverting the patch:

# xe vdi-pool-migrate uuid=270c3dc8-27bc-4ef2-982a-a6142f0ffce1 sr-uuid=aade9dc6-ff31-3204-4187-37f82ad06c77
Cannot forward messages because the server cannot be contacted. The server may be switched off or there may be network connectivity problems.
host: 0a3a45a0-8754-4570-bfde-6ef6843ccda1 (xs2)

The changes are best reviewed one by one, and while ignoring whitespace, if using github.

@last-genius
Copy link
Contributor

might be worth specifying that the connection that's idle for 12 hours is not the main one, but the auxiliary one

When a VDI.pool_migrate starts at a pool member, a connection between the
coordinator and that host remains open for the duration of the migration. This
connection is completely idle. If the migration lasts for more than 12 hours,
stunnel closes the connection due to inactivity, which cancels the migration.

To avoid this use an internal API that uses short-running connection whenever
possible to avoid interrupting the migration.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This allows to reduce most of the indentation in check_operation_error, which
returns searches for a single error and returns it.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
@psafont psafont force-pushed the private/paus/async-mig branch from 94a890e to 60b440b Compare November 5, 2024 13:08
@@ -22,49 +22,49 @@ open D
(**************************************************************************************)
(* current/allowed operations checking *)

let feature_of_op =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this function should go to Smint but I think it is ok to keep it here for this PR for clarity as it was here.

@psafont psafont force-pushed the private/paus/async-mig branch from 60b440b to 0132829 Compare November 5, 2024 14:22
There's no seeming reason these were missing, and they need to be added to be
able to map the VDI operations to SR ones for better error messages

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
@psafont psafont force-pushed the private/paus/async-mig branch from 0132829 to e40b3fc Compare November 5, 2024 14:24
@psafont psafont added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@psafont psafont added this pull request to the merge queue Nov 5, 2024
Merged via the queue into xapi-project:master with commit 3a776c0 Nov 5, 2024
15 checks passed
@psafont psafont deleted the private/paus/async-mig branch November 5, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants