Skip to content

Conversation

kenfreeman
Copy link
Contributor

Listing pending requests can block the proxy agent. Support a shorter timeout for that method (/pending) than for /request and /response, which are very sensitive to the use case.

agent/agent.go Outdated
var (
proxy = flag.String("proxy", "", "URL (including scheme) of the inverting proxy")
proxyTimeout = flag.Duration("proxy-timeout", 60*time.Second, "Client timeout when sending requests to the inverting proxy")
listRequestsTimeout = flag.Duration("list-requests-timeout", 0*time.Second, "Client timeout for listing requests from the inverting proxy. Defaults to proxy-timeout value")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations of the proxy have to coordinate with the existing proxy-timeout flag specifically for the purposes of the calls to list pending requests (the proxy has to timeout those calls before the agent times them out or else there is a risk of requests getting dropped).

As such, I'd rather that we keep using the proxy-timeout flag value for the calls to list pending requests, and make the new flag be used for all the other calls.

E.G. the new flag could be something like proxy-client-timeout and be used where the proxyTimeout variable was previously used.

That being said, we would want to update the doc string for the existing proxy-timeout flag to indicate its update scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I went in the direction I did is because users will set their timeouts based on the expected load and latency of their backends. So it made sense to me to keep the existing flag for that purpose.

That said, I'm happy to make the change you suggest. I just want to be sure that any users who don't set the new value get the current behavior. So we could create a proxy-client-timeout but use proxy-timeout if proxy-client-timeout isn't set.

I am a bit confused by the terminology. What is a "proxy client"?

I see the proxy-agent doing 3 things: (1) getting a list of pending requests from the IVP; (2) Pulling an individual request from the IVP; (3) Forwarding a request to a backend and waiting for a result. I think #1 and #2 are both proxy clients, since they call the IVP URL.

(I understand #3 goes through a host proxy, but I think of that as an implementation detail. When I hear "proxy" in this context, I think of just the IVP.)

Would it make sense to support 3 separate timeouts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by the terminology. What is a "proxy client"?

This would be used as the timeout set on the HTTP client used to communicate with the proxy.

I just want to be sure that any users who don't set the new value get the current behavior.

Agreed. I think the best way to do that is to require the value of the new flag to be larger than the value of the existing flag, explicitly raising it if it is not.

Would it make sense to support 3 separate timeouts?

I don't think so. The agent makes three types of calls to the Inverting Proxy:

  1. List the IDs of pending requests
  2. Fetch the contents of a single request
  3. Post the response for a single request

You could think of this as two broad actions:

  1. Poll for new requests
  2. Process a single request (read it and then post back the response)

So, having two separate timeouts makes sense, but having three would be overkill because the two other actions are both just part of processing a single request.

So, an alternative might be to name the new flag something like request-timeout or request-forwarding-timeout.

Thinking about it more, I think I like that better because it makes the existing proxy-timeout name make more sense in comparison, as in one case we are waiting for an individual request/response, but in the other we are only waiting on the Inverting Proxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. In short, the change would:

  1. Define a new flag called request-forwarding-timeout with a default value of 0
  2. Set its value to the proxy timeout if no value was provided, or if a value less than the proxy timeout was provided
  3. Use proxy timeout for getting the list of pending requests. Implement this with a context with a timeout.
  4. Use the new request-forwarding-timeout for reading requests and posting the response. This can be implemented by setting a timeout on the HTTP connection.
  5. Change the help message on proxy-timeout from "Client timeout when sending requests to the inverting proxy" to "Timeout for polling the inverting proxy for new requests" or similar
  6. Add a help message on request-forwarding-timeout to something like "Timeout for forwarding individual requests to the backend and returning a response"

WDYT?

@kenfreeman
Copy link
Contributor Author

Updated the PR. PTAL

Copy link
Collaborator

@ojarjur ojarjur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for putting in the effort for this fix.

Overall, this looks good to me, but I had a few minor suggestions for improvements below.

agent/agent.go Outdated
Comment on lines 312 to 317
requestForwardingTimeoutToUse := *proxyTimeout
if *requestForwardingTimeout > *proxyTimeout {
requestForwardingTimeoutToUse = *requestForwardingTimeout
}

client.Timeout = requestForwardingTimeoutToUse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.Timeout = max(*proxyTimeout, *requestForwardingTimeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Using a variable so we can also log the value.

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

backendHomeDir, err := ioutil.TempDir("", "backend-home")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use t.TempDir() now since we've increased the minimum Go version to be greater than 1.15: https://pkg.go.dev/testing#T.TempDir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[]string{"${GOPATH}/bin/proxy-forwarding-agent"},
"--backend=testBackend",
"--proxy", proxyURL+"/",
"--proxy-timeout=10ms", // Use a very short timeout to force a timeout error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a race condition which increases the risk of test flakiness.

Have you run the test repeatedly to see how reliable it is?

If you've run it more than 100 times without failure, then I think that's good enough, but otherwise let's see if we can add something to force the timeout to always occur (such as adding a sleep before the first request is sent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the test logic into a function which is run 20 times with a short timeout (timeouts are expected) and 20 with a long timeout (timeouts should not occur). I manually ran 100 times and the test was not flaky. I decreased the count because I don't want to overly slow down test execution.

requestForwardingTimeout := "60s"
wantTimeout := true

// Run the test many times to guard against flakiness
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be doing this within the test itself.

Running a test multiple times is better done just using the --count flag to go test.

Moreover, I don't think running in a loop would reduce flakiness; it would make any failure in any iteration cause the entire test to fail.

That seems like it would exacerbate any potential sources of flakiness rather than minimizing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the loop

timeoutTest(t, proxyTimeout, requestForwardingTimeout, wantTimeout)
}

// Now test with a long-ish timeout to ensure we don't get timeouts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a separate test so that if one passes and the other fails the test report makes it immediately clear where the failure was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ojarjur ojarjur merged commit 4c6aeb4 into google:master May 13, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants