-
Notifications
You must be signed in to change notification settings - Fork 79
Support a separate timeout for listing pending requests #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also did a little cleanup
This reverts commit 043a010.
agent/agent.go
Outdated
var ( | ||
proxy = flag.String("proxy", "", "URL (including scheme) of the inverting proxy") | ||
proxyTimeout = flag.Duration("proxy-timeout", 60*time.Second, "Client timeout when sending requests to the inverting proxy") | ||
listRequestsTimeout = flag.Duration("list-requests-timeout", 0*time.Second, "Client timeout for listing requests from the inverting proxy. Defaults to proxy-timeout value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations of the proxy have to coordinate with the existing proxy-timeout
flag specifically for the purposes of the calls to list pending requests (the proxy has to timeout those calls before the agent times them out or else there is a risk of requests getting dropped).
As such, I'd rather that we keep using the proxy-timeout
flag value for the calls to list pending requests, and make the new flag be used for all the other calls.
E.G. the new flag could be something like proxy-client-timeout
and be used where the proxyTimeout
variable was previously used.
That being said, we would want to update the doc string for the existing proxy-timeout
flag to indicate its update scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I went in the direction I did is because users will set their timeouts based on the expected load and latency of their backends. So it made sense to me to keep the existing flag for that purpose.
That said, I'm happy to make the change you suggest. I just want to be sure that any users who don't set the new value get the current behavior. So we could create a proxy-client-timeout but use proxy-timeout if proxy-client-timeout isn't set.
I am a bit confused by the terminology. What is a "proxy client"?
I see the proxy-agent doing 3 things: (1) getting a list of pending requests from the IVP; (2) Pulling an individual request from the IVP; (3) Forwarding a request to a backend and waiting for a result. I think #1 and #2 are both proxy clients, since they call the IVP URL.
(I understand #3 goes through a host proxy, but I think of that as an implementation detail. When I hear "proxy" in this context, I think of just the IVP.)
Would it make sense to support 3 separate timeouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by the terminology. What is a "proxy client"?
This would be used as the timeout set on the HTTP client used to communicate with the proxy.
I just want to be sure that any users who don't set the new value get the current behavior.
Agreed. I think the best way to do that is to require the value of the new flag to be larger than the value of the existing flag, explicitly raising it if it is not.
Would it make sense to support 3 separate timeouts?
I don't think so. The agent makes three types of calls to the Inverting Proxy:
- List the IDs of pending requests
- Fetch the contents of a single request
- Post the response for a single request
You could think of this as two broad actions:
- Poll for new requests
- Process a single request (read it and then post back the response)
So, having two separate timeouts makes sense, but having three would be overkill because the two other actions are both just part of processing a single request.
So, an alternative might be to name the new flag something like request-timeout
or request-forwarding-timeout
.
Thinking about it more, I think I like that better because it makes the existing proxy-timeout
name make more sense in comparison, as in one case we are waiting for an individual request/response, but in the other we are only waiting on the Inverting Proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. In short, the change would:
- Define a new flag called request-forwarding-timeout with a default value of 0
- Set its value to the proxy timeout if no value was provided, or if a value less than the proxy timeout was provided
- Use proxy timeout for getting the list of pending requests. Implement this with a context with a timeout.
- Use the new request-forwarding-timeout for reading requests and posting the response. This can be implemented by setting a timeout on the HTTP connection.
- Change the help message on proxy-timeout from "Client timeout when sending requests to the inverting proxy" to "Timeout for polling the inverting proxy for new requests" or similar
- Add a help message on request-forwarding-timeout to something like "Timeout for forwarding individual requests to the backend and returning a response"
WDYT?
Updated the PR. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for putting in the effort for this fix.
Overall, this looks good to me, but I had a few minor suggestions for improvements below.
agent/agent.go
Outdated
requestForwardingTimeoutToUse := *proxyTimeout | ||
if *requestForwardingTimeout > *proxyTimeout { | ||
requestForwardingTimeoutToUse = *requestForwardingTimeout | ||
} | ||
|
||
client.Timeout = requestForwardingTimeoutToUse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client.Timeout = max(*proxyTimeout, *requestForwardingTimeout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Using a variable so we can also log the value.
agent/agent_test.go
Outdated
ctx, cancel := context.WithCancel(context.Background()) | ||
defer cancel() | ||
|
||
backendHomeDir, err := ioutil.TempDir("", "backend-home") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use t.TempDir()
now since we've increased the minimum Go version to be greater than 1.15
: https://pkg.go.dev/testing#T.TempDir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
agent/agent_test.go
Outdated
[]string{"${GOPATH}/bin/proxy-forwarding-agent"}, | ||
"--backend=testBackend", | ||
"--proxy", proxyURL+"/", | ||
"--proxy-timeout=10ms", // Use a very short timeout to force a timeout error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a race condition which increases the risk of test flakiness.
Have you run the test repeatedly to see how reliable it is?
If you've run it more than 100 times without failure, then I think that's good enough, but otherwise let's see if we can add something to force the timeout to always occur (such as adding a sleep before the first request is sent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the test logic into a function which is run 20 times with a short timeout (timeouts are expected) and 20 with a long timeout (timeouts should not occur). I manually ran 100 times and the test was not flaky. I decreased the count because I don't want to overly slow down test execution.
agent/agent_test.go
Outdated
requestForwardingTimeout := "60s" | ||
wantTimeout := true | ||
|
||
// Run the test many times to guard against flakiness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be doing this within the test itself.
Running a test multiple times is better done just using the --count
flag to go test
.
Moreover, I don't think running in a loop would reduce flakiness; it would make any failure in any iteration cause the entire test to fail.
That seems like it would exacerbate any potential sources of flakiness rather than minimizing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the loop
agent/agent_test.go
Outdated
timeoutTest(t, proxyTimeout, requestForwardingTimeout, wantTimeout) | ||
} | ||
|
||
// Now test with a long-ish timeout to ensure we don't get timeouts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a separate test so that if one passes and the other fails the test report makes it immediately clear where the failure was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Listing pending requests can block the proxy agent. Support a shorter timeout for that method (/pending) than for /request and /response, which are very sensitive to the use case.