-
Notifications
You must be signed in to change notification settings - Fork 3.9k
rls: Control plane channel monitor state and back off handling #12460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
rls: Control plane channel monitor state and back off handling #12460
Conversation
Fix test. Unit test. Save changes. In-progress changes for control plane connectivity state monitoring. Test for making backoff timer expiry not make a RLS rpc directly but update the parent LB policy with picker after deleting backoff cache entry. Save in-progress changes. Save changes.
…hannel-monitor-state
Remove unused class.
| } | ||
| } | ||
| rlsChannel.notifyWhenStateChanged(currentState, () -> rlsServerConnectionStateChanged()); | ||
| lastRlsServerConnectivityState = currentState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused.
| lastRlsServerConnectivityState = rlsChannel.getState(false); | ||
| rlsChannel.notifyWhenStateChanged( | ||
| lastRlsServerConnectivityState, () -> rlsServerConnectionStateChanged()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming the current state, as the booleans aren't being updated. I think the safer thing to do is explicitly assume the current state, and the callback will run if it is different. So then you can initialize the variables assuming that initial state. IDLE would be a good guess for the initial state, and you can just pass that explicitly to notifyWhenStateChanged() here
| private final RefCountedChildPolicyWrapperFactory refCountedChildPolicyWrapperFactory; | ||
| private final ChannelLogger logger; | ||
| private final ChildPolicyWrapper fallbackChildPolicyWrapper; | ||
| private ConnectivityState lastRlsServerConnectivityState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to have a comment describing the threading behavior for these variables. (e.g., "Only accessed in rlsServerConnectionStateChanged() callback").
| "[RLS Entry {0}] Transition to back off: status={1}, delayNanos={2}", | ||
| request, status, delayNanos); | ||
| BackoffCacheEntry entry = new BackoffCacheEntry(request, status, backoffPolicy); | ||
| BackoffCacheEntry entry = new BackoffCacheEntry(request, status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we no longer do exponential backoff? I see your PR description says "Backoff timer expiry reset backoffs", but why would we want to do that? The "Failed Requests" section of the doc talks about the backoff_expiration_time_ being set for exponential backoff. We need to record the exponent somewhere while waiting for the time to expire, and it seems the BackoffCacheEntry is appropriate for that.
If you followed the code flow, I think you'd see that with the change here it'd cause the backoffPolicy argument to createBackoffEntry to always be null. refreshBackoffEntry() is the only place non-null was passed.
| } | ||
| logger.log(ChannelLogLevel.DEBUG, | ||
| "[RLS Entry {0}] Calling RLS for transition to pending", entry.request); | ||
| linkedHashLruCache.invalidate(entry.request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalidating will cause us to forget the exponential backoff state, so if the next attempt fails we'll start over at the beginning. We need to record the exponent somewhere until the request is re-attempted. Keeping it in the cache seems fair, although we'll need to check for expired BackoffCacheEntrys after calling linkedHashLruCache.read().
Backoff timer expiry reset backoffs and update RLS picker, and RLS connectivity state change from TRANSIENT_FAILURE to READY reset all backoffs.