[lxd] Add 60 seconds timeout for state operations #1821

townsend2010 · 2020-10-30T13:46:13Z

If no timeout is set, LXD uses a hardcoded 30 second timeout when waiting on
operations to complete and if the wait timeout occurs, it can lead to incorrect
behavior in the LXD backend.

Fixes #1777

If no timeout is set, LXD uses a hardcoded 30 second timeout when waiting on operations to complete and if the wait timeout occurs, it can lead to incorrect behavior in the LXD backend. Fixes #1777

codecov · 2020-10-30T14:10:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.95%. Comparing base (ccfdaaf) to head (a76ca22).
Report is 6600 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1821   +/-   ##
=======================================
  Coverage   76.95%   76.95%           
=======================================
  Files         229      229           
  Lines        8512     8512           
=======================================
  Hits         6550     6550           
  Misses       1962     1962

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Saviq · 2020-11-04T13:36:41Z

The problem I see with this is that 60s just just as arbitrary as 30s… and the "Operation cancelled" error is somewhat devoid of detail (I know…). Can we try and convert a "Operation cancelled" to, say, "Likely timed out"? Is there no detail about why it was cancelled available at all?

townsend2010 · 2020-11-04T13:45:02Z

Well, as mentioned, without putting a timeout at all here, LXD has a hard coded timeout of 30 seconds, which is what is tripping us up on shutdown. It seems LXD has some sort of race between the hard coded operation timeout and the actual timeout of trying to shut down an instance. I've looked through the LXD code in this area and I really don't understand why they did what they did.

That saidt, setting an explicit timeout for the state operation puts the onus on the state change, not the operation wait part, so it gets around this issue. I can easily make it 30 seconds since that is the arbitrary wait we have in other backends, but I think allowing more time for an instance to shut down is better. In fact, we've had complaints in the past about trying to cleanly shut down busy instances and 30 seconds is not enough time.

Regarding Operation cancelled, this should avoid that particular issue since we really won't be waiting on the operation itself. Also, no LXD does not really offer any helpful in its error messaging. We could catch this particular error and say something like "Timeout occurred waiting on operation to complete."

Saviq · 2020-11-05T07:44:44Z

That set, setting an explicit timeout for the state operation puts the onus on the state change, not the operation wait part, so it gets around this issue. I can easily make it 30 seconds since that is the arbitrary wait we have in other backends, but I think allowing more time for an instance to shut down is better. In fact, we've had complaints in the past about trying to cleanly shut down busy instances and 30 seconds is not enough time.

OK, so I was missing that context, that this is different to the internal LXD timeout. Should we explain in a comment?

townsend2010 · 2020-11-05T13:52:33Z

Should we explain in a comment?

Sure, I can add a comment, but it'll be verbose in order to explain why this was added. But really, adding an explicit timeout here puts us in control of the timeout and should be used regardless of working around LXD idiosyncrasies.

townsend2010 · 2020-11-06T14:55:09Z

In some testing to check on some behaviors, I found some issues with this, so will continue working on it.

joes · 2022-09-22T09:38:58Z

When I try to delete a multipass lxd instance it will quite often result in a timeout:

$ multipass delete k8-devnode-2
[2022-09-22T09:30:58.018] [error] [lxd request] Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/unix.socket@1.0/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass
[2022-09-22T09:30:58.018] [error] [lxd request] Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/unix.socket@1.0/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass
delete failed: Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/unix.socket@1.0/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass

As for the VM it only got stopped (not deleted):

$ multipass list
k8-devnode-2          Stopped           --               Ubuntu 20.04 LTS

Would this pull request fix this @townsend2010 or should I file a separate issue?

$ multipass version
multipass   1.10.1
multipassd  1.10.1

$ multipass get local.driver
lxd

$ lxd version
5.5

I also expose multipassd to the network and set the environment for this:

echo $MULTIPASS_SERVER_ADDRESS
multipass.intra:51005

jibel · 2025-01-16T15:58:20Z

Obsolete

[lxd] Add 60 seconds timeout for state operations

a76ca22

If no timeout is set, LXD uses a hardcoded 30 second timeout when waiting on operations to complete and if the wait timeout occurs, it can lead to incorrect behavior in the LXD backend. Fixes #1777

townsend2010 marked this pull request as draft November 6, 2020 14:55

Base automatically changed from master to main March 3, 2021 13:41

jibel closed this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[lxd] Add 60 seconds timeout for state operations #1821

[lxd] Add 60 seconds timeout for state operations #1821

Uh oh!

townsend2010 commented Oct 30, 2020

Uh oh!

codecov bot commented Oct 30, 2020 •

edited

Loading

Uh oh!

Saviq commented Nov 4, 2020 •

edited

Loading

Uh oh!

townsend2010 commented Nov 4, 2020 •

edited

Loading

Uh oh!

Saviq commented Nov 5, 2020

Uh oh!

townsend2010 commented Nov 5, 2020

Uh oh!

townsend2010 commented Nov 6, 2020

Uh oh!

joes commented Sep 22, 2022 •

edited

Loading

Uh oh!

jibel commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[lxd] Add 60 seconds timeout for state operations #1821

[lxd] Add 60 seconds timeout for state operations #1821

Uh oh!

Conversation

townsend2010 commented Oct 30, 2020

Uh oh!

codecov bot commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Saviq commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

townsend2010 commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Saviq commented Nov 5, 2020

Uh oh!

townsend2010 commented Nov 5, 2020

Uh oh!

townsend2010 commented Nov 6, 2020

Uh oh!

joes commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jibel commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Oct 30, 2020 •

edited

Loading

Saviq commented Nov 4, 2020 •

edited

Loading

townsend2010 commented Nov 4, 2020 •

edited

Loading

joes commented Sep 22, 2022 •

edited

Loading