-
Notifications
You must be signed in to change notification settings - Fork 722
SOLR-17711 index fetcher doesn't need timeout #3356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SOLR-17711 index fetcher doesn't need timeout #3356
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
Can you please add to CHANGES.txt in the improvement section?
@dsmiley Just saw your message to the mailing list, thanks. So it looks like what I propose here is incidentally a no-op as it currently stands? Should I just wait for your PR then? Looks like I'd have to make this change to the |
@dsmiley I updated the CHANGES.txt under 10 but I'm wondering if this should go out in 9.9? I'd argue this is something between a bug and an optimization since the request timeouts were added very likely by accident. I've also moved the override to the recovery client in |
Until the underlying timeout propagation is fixed, this PR is in limbo. I view this as a bug, not an optimization, since the user experience is just like a bug. For users with large shards, Solr worked and then later breaks after an upgrade. Let's try to get this fixed in 9.9 somehow. It's tempting to switch back to Apache HttpClient there, in the interests of stability / confidence. WDYT? Then in parallel improve Jetty Http2SolrClient without feeling rushed for 9.9. CC @iamsanjay |
@dsmiley is the request timeout even at the HttpClient level? I didn't think it was so I don't think we have any propagation to worry about in this particular case (I think that applies to connect and idle timeouts). Also I moved the setting of the request timeout to the "root" HttpClient (recoveryOnlyClient) anyways. Also I realized I meant improvement (as you originally suggested) not optimization ... but the point about this being a bug probably still stands. |
I quickly looked at Jetty HttpClient and there's no request timeout there. It's on the Jetty Request. |
https://issues.apache.org/jira/browse/SOLR-17711
Description
There is now a default total request timeout applied to IndexFetcher because of Jetty Http2 client migration. We ran into a few problems with this already during replication of large number of shards on a single host. Sometimes we hit machine bandwidth limits because shards are downloaded in parallel. Thus requests for the next segment of each shard compete with each other and sometimes take more than 120 seconds. The 120 second number is important because it is the default idle timeout and ever since this change, the idle timeout happens to be the total request timeout in the http2 client. I don't think that is a good idea in the case of IndexFetcher, hence this change.
Solution
Revert to the pre-
Http2SolrClient
behavior of no timeout inIndexFetcher
.Tests
Not aware of an easy way to test this reasonably, i.e. to accelerate the clock.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.