Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ [Warm Replicas] Implement warm replica support for controllers. #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
✨ [Warm Replicas] Implement warm replica support for controllers. #3192
Changes from 31 commits
8239300
73fc8fa
be1b1c2
c9b99eb
e7a2bbf
854987c
072ad4b
43118a3
b67bc65
fc7c8c5
6bb4616
ccc7485
54f4fe3
667bb03
66e3be4
d9cc96b
65a04d5
c201bfa
5a13db4
4879527
57acc77
1987b54
a49f3a4
89f5479
9d5ddfb
66f64f0
0563114
aa20ef5
c9a2973
79a7b95
c1d8ea4
5df573f
d8650df
a03f404
dcf4b8b
ba51d28
ea2aa0e
730b30e
bca3e2a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think c.Started doesn't work anymore as expected.
With the current version of the PR it's used for two purposes:
=> 1. still works as expected. 2 leads to problems
Now the following can happen:
=> So the Source added in 2. is never started
I would suggest to
c.startedEventSources
c.startedEventSources
in l.205 instead ofc.Started
c.startedEventSources
to true instartEventSources
after we setc.startWatches
to nil@alvaroaleman does this sound correct to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. Also, lets add a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d8650df fixes this but there is another race that this uncovers while testing
L336 starts a goroutine that can potentially outlive the duration of the caller holding the lock. The errGroup blocks on
case <-sourceStartCtx.Done()
, meaning that when the context is cancelled, startEventSources() doesn't wait for the the goroutine on L336 to complete before returning. This means that the mutex acquired in Warmup can be released before L339 is executed.The problem variable is
c.Queue
since it can be read on L339, but also assigned to in Start on L261.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do something as follows? Keep the current error handling behavior, but add a channel blocking until watch.Start is called in the defer block
edit: 730b30e was the attempt, but looks like it doesn't work because it fails the case where watch.Start blocks indefinitely.
edit 2:
bca3e2a I added a
hasAccessedQueueChan
channel to track whether or not c.Queue has been accessed and is safe to release the lock. Ideas for a better name is welcome :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is bigger. Does Warmup start the sources with a nil queue? (if it is called before Start)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would have to write c.Queue already in New. This also starts the queue though.
So this would also change the observable behavior of functions like NewUnmanaged / NewTypedUnmanaged.
So I think we can't move it into New.
The easiest might be to move it into startEventSourcesLocked (and rename that func to startEventSourcesAndQueueLocked)
Like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one consequence is that depending on if Warmup or Start is actually running the sync.Once in startEventSourcesLocked the queue is getting shutdown if either the Warmup or one of the other runnable groups is shutdown, but this should be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, extremely good catch.
This makes sense to me.
I think so, yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that our tests validate that the queue we pass to the sources is non-nil? This should've been caught by tests and not by a human ideally