-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Increase the default number of IO queues on larger machines #56501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.
Some throughput below, on the arm64 Ampere machine with 80 procs. Before the change is with 16 IO queues, and after the change is with 40 IO queues. Looks like there would be some regressions and improvements with the change. Not sure if the regressions would be acceptable, but opening for consideration anyway. Plaintext: Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario. Json: Slight regression at 256 connections, and slight improvement at the other connection counts. Fortunes: Looks mostly like no change. |
// parallelism of processing work queued to IOQueues. The default number below is based on the processor count and tries | ||
// to use a high-enough number for that to not be a significant limiting factor for throughput. | ||
|
||
int processorCount = Environment.ProcessorCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
int processorCount = Environment.ProcessorCount; | |
var processorCount = Environment.ProcessorCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the explicit type preferred in this case? I thought that was the case in the runtime repo anyway, not sure about the guidelines in this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if var is allowed in this repo, how is it better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"var
all the things" is convention here (personally I prefer the explicit type, but to be consistent w/ the codebase -> var
).
A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck. Alternatives could be considered, some challenges are outlined in #41391. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario.
I don't think the plaintext scenario is super interesting on its own, because as you note, it uses HTTP/1.1 pipelining which is rare to see in the wild outside of benchmarking.
However, poor HTTP/1.1 pipelining performance can be indicative of poor HTTP/2 performance since HTTP/2 can effectively pipeline 100 requests at a time with the default stream limit of 100. It would be nice to get some HTTP/2 numbers before changing this.
The main benchmark app takes a --threadCount parameter, so we can also use that to test this without needing to modify the socket transport. @sebastienros has done a lot of testing using non-default IOQueueCounts for different benchmarks. I know we've hard coded the --threadCount
for some of those scenarios in the past, but I don't think we do currently.
A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck.
This is why we clamped the value to begin with. We erred on the side of limiting parallelism where benchmark numbers didn't show a clear improvement which happened to be any count higher than 16 with our machines at the time.
But given we now have evidence that more parallelism can significantly help many-core machines at least on Linux, I'm all for trying new things. The way we came up with Math.Min(Environment.ProcessorCount, 16)
wasn't super scientific to begin with. I think it's stayed that way mostly due to inertia and fear of regressing scenarios.
Alternatives could be considered, some challenges are outlined in #41391.
The challenges and tradeoffs are what make it interesting. I'm all for trying bigger changes too, but I don't see the harm of trying something like this first. We'll need to keep a close eye on how this affects all our benchmarks at https://aka.ms/aspnet/benchmarks, and be ready to revert if this causes significant regressions in important scenarios.
@sebastienros I can merge this if you don't have any objections. |
[Edit] Nevermind, I mistook one thing for another, these are not http/2 benchmarks |
What would be a good benchmark to run to measure HTTP/2 multiplexing? |
One of the gRPC scenarios would be good. Make sure to set the stream count to something like 70 or 100. I think the default is 1. |
Grpc - This is running on the Ampere machine as server (80 procs) and citrine-amd as load (48 proc) with Maybe a slight regression, but a high error margin too and maybe not too significant. The test also appears to be more load-heavy than app-heavy and I'm not sure how representative it is. |
The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.