rls: Control plane channel monitor state and back off handling #12460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kannanjgithub wants to merge 3 commits into grpc:master from kannanjgithub:rls-control-plane-channel-monitor-state

+140 −71

Contributor

kannanjgithub commented Oct 31, 2025

Backoff timer expiry reset backoffs and update RLS picker, and RLS connectivity state change from TRANSIENT_FAILURE to READY reset all backoffs.

kannanjgithub added 2 commits

October 31, 2025 12:37


          rls: Backoff timer fix and RLS channel connectivity state behavior

dffb376

Fix test.

Unit test.

Save changes.

In-progress changes for control plane connectivity state monitoring.

Test for making backoff timer expiry not make a RLS rpc directly but update the parent LB policy with picker after deleting backoff cache entry.

Save in-progress changes.

Save changes.


          Merge remote-tracking branch 'origin/master' into rls-control-plane-c…

41f3cc2

…hannel-monitor-state

kannanjgithub requested a review from ejona86

October 31, 2025 14:32


          Remove unused code.

519dcce

Remove unused class.

ejona86 reviewed

View reviewed changes

rls/src/main/java/io/grpc/rls/CachingRlsLbClient.java

    
                    }

                  }

                  rlsChannel.notifyWhenStateChanged(currentState, () -> rlsServerConnectionStateChanged());

                  lastRlsServerConnectivityState = currentState;

Member

ejona86 Oct 31, 2025

This is unused.

rls/src/main/java/io/grpc/rls/CachingRlsLbClient.java

Comment on lines +222 to +224

    
                  lastRlsServerConnectivityState = rlsChannel.getState(false);

                  rlsChannel.notifyWhenStateChanged(

                      lastRlsServerConnectivityState, () -> rlsServerConnectionStateChanged());

Member

ejona86 Oct 31, 2025

This is assuming the current state, as the booleans aren't being updated. I think the safer thing to do is explicitly assume the current state, and the callback will run if it is different. So then you can initialize the variables assuming that initial state. IDLE would be a good guess for the initial state, and you can just pass that explicitly to notifyWhenStateChanged() here

rls/src/main/java/io/grpc/rls/CachingRlsLbClient.java

    
                private final RefCountedChildPolicyWrapperFactory refCountedChildPolicyWrapperFactory;

                private final ChannelLogger logger;

                private final ChildPolicyWrapper fallbackChildPolicyWrapper;

                private ConnectivityState lastRlsServerConnectivityState;

Member

ejona86 Oct 31, 2025

It'd be good to have a comment describing the threading behavior for these variables. (e.g., "Only accessed in rlsServerConnectionStateChanged() callback").

rls/src/main/java/io/grpc/rls/CachingRlsLbClient.java

    
                      "[RLS Entry {0}] Transition to back off: status={1}, delayNanos={2}",

                      request, status, delayNanos);

                  BackoffCacheEntry entry = new BackoffCacheEntry(request, status, backoffPolicy);

                  BackoffCacheEntry entry = new BackoffCacheEntry(request, status);

Member

ejona86 Oct 31, 2025

So we no longer do exponential backoff? I see your PR description says "Backoff timer expiry reset backoffs", but why would we want to do that? The "Failed Requests" section of the doc talks about the backoff_expiration_time_ being set for exponential backoff. We need to record the exponent somewhere while waiting for the time to expire, and it seems the BackoffCacheEntry is appropriate for that.

If you followed the code flow, I think you'd see that with the change here it'd cause the backoffPolicy argument to createBackoffEntry to always be null. refreshBackoffEntry() is the only place non-null was passed.

rls/src/main/java/io/grpc/rls/CachingRlsLbClient.java

    
                    }

                    logger.log(ChannelLogLevel.DEBUG,

                        "[RLS Entry {0}] Calling RLS for transition to pending", entry.request);

                    linkedHashLruCache.invalidate(entry.request);

Member

ejona86 Oct 31, 2025

Invalidating will cause us to forget the exponential backoff state, so if the next attempt fails we'll start over at the beginning. We need to record the exponent somewhere until the request is re-attempted. Keeping it in the cache seems fair, although we'll need to check for expired BackoffCacheEntrys after calling linkedHashLruCache.read().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet