Skip to content

[Backport release-1.30] Wait for lease pool shutdown when stopping components #6019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: release-1.30
Choose a base branch
from

Conversation

@twz123 twz123 added bug Something isn't working area/controlplane labels Jun 20, 2025
@twz123 twz123 force-pushed the backport-6017-to-release-1.30 branch 2 times, most recently from ed17715 to d08611d Compare June 20, 2025 12:30
twz123 added 2 commits June 20, 2025 16:25
Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit 6d9e36c)
Move server count and cluster config into the reconciliation goroutine
and pass them as args to the write method. This will limit their scope
and prevent them from being inadvertently updated by other goroutines.
Remove the mutex, since the write method is only called by a single
goroutine.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit f53d4b4)
@twz123 twz123 force-pushed the backport-6017-to-release-1.30 branch 2 times, most recently from 4ebfc59 to 2b89aeb Compare June 20, 2025 14:33
twz123 added 3 commits June 20, 2025 16:56
The controller lease counter requires the API server to be up and
running in order to work properly. It cannot acquire or count leases
before the API server is up, nor can it release the lease after the API
server is shut down.

Move the counter component down in the controller start method, so
that it's started after the API server. Decouple the actual controller
count value from the counter component by putting it directly into the
start method. Let the counter component update that value, and let the
konnectivity components consume those updates.

Introduce internal/sync.Latest as a small abstraction for values that
may change over time and will be consumed by multiple observers.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit 3a6b68b)
The context shouldn't be stored in structs, but passed as the first
argument to functions and methods. Remove the context storage from the
lease pool and pass it to the methods where it's actually needed.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit a7d8d12)
The components holding a lease pool weren't waiting for the pool to
shutdown when they were stopped. They simply canceled an internal
context. This led to a race condition with the API server shutdown.
There was no guarantee that the API server was still up and running when
the lease pool tried to invalidate its lease. This left the pool in a
stale state within the cluster, causing leader transitions to take much
more time than necessary.

Add a new channel as a return value to LeasePool that callers can use
to await pool termination. Wire this up in the components that use
LeasePool.

Signed-off-by: Tom Wieczorek <twieczorek@mirantis.com>
(cherry picked from commit c37a021)
(cherry picked from commit 6f23599)
(cherry picked from commit 9098da3)
@twz123 twz123 force-pushed the backport-6017-to-release-1.30 branch from 2b89aeb to 1385991 Compare June 20, 2025 14:57
@twz123 twz123 marked this pull request as ready for review June 20, 2025 18:57
@twz123 twz123 requested a review from a team as a code owner June 20, 2025 18:57
@twz123 twz123 requested review from ncopa and juanluisvaladas June 20, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controlplane bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant