Increase the default number of IO queues on larger machines #56501

kouvel · 2024-06-28T06:53:28Z

The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.

kouvel · 2024-06-28T06:54:50Z

Some throughput below, on the arm64 Ampere machine with 80 procs. Before the change is with 16 IO queues, and after the change is with 40 IO queues. Looks like there would be some regressions and improvements with the change. Not sure if the regressions would be acceptable, but opening for consideration anyway.

Plaintext:

Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario.

Json:

Slight regression at 256 connections, and slight improvement at the other connection counts.

Fortunes:

Looks mostly like no change.

gfoidl · 2024-06-28T16:24:58Z

src/Servers/Kestrel/Transport.Sockets/src/Internal/IOQueue.cs

+        // parallelism of processing work queued to IOQueues. The default number below is based on the processor count and tries
+        // to use a high-enough number for that to not be a significant limiting factor for throughput.
+
+        int processorCount = Environment.ProcessorCount;


Nit:

Suggested change

int processorCount = Environment.ProcessorCount;

var processorCount = Environment.ProcessorCount;

Isn't the explicit type preferred in this case? I thought that was the case in the runtime repo anyway, not sure about the guidelines in this repo.

Even if var is allowed in this repo, how is it better here?

"var all the things" is convention here (personally I prefer the explicit type, but to be consistent w/ the codebase -> var).

src/Servers/Kestrel/Transport.Sockets/src/Internal/IOQueue.cs

kouvel · 2024-06-28T16:53:16Z

A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck. Alternatives could be considered, some challenges are outlined in #41391.

halter73

Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario.

I don't think the plaintext scenario is super interesting on its own, because as you note, it uses HTTP/1.1 pipelining which is rare to see in the wild outside of benchmarking.

However, poor HTTP/1.1 pipelining performance can be indicative of poor HTTP/2 performance since HTTP/2 can effectively pipeline 100 requests at a time with the default stream limit of 100. It would be nice to get some HTTP/2 numbers before changing this.

The main benchmark app takes a --threadCount parameter, so we can also use that to test this without needing to modify the socket transport. @sebastienros has done a lot of testing using non-default IOQueueCounts for different benchmarks. I know we've hard coded the --threadCount for some of those scenarios in the past, but I don't think we do currently.

A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck.

This is why we clamped the value to begin with. We erred on the side of limiting parallelism where benchmark numbers didn't show a clear improvement which happened to be any count higher than 16 with our machines at the time.

But given we now have evidence that more parallelism can significantly help many-core machines at least on Linux, I'm all for trying new things. The way we came up with Math.Min(Environment.ProcessorCount, 16) wasn't super scientific to begin with. I think it's stayed that way mostly due to inertia and fear of regressing scenarios.

Alternatives could be considered, some challenges are outlined in #41391.

The challenges and tradeoffs are what make it interesting. I'm all for trying bigger changes too, but I don't see the harm of trying something like this first. We'll need to keep a close eye on how this affects all our benchmarks at https://aka.ms/aspnet/benchmarks, and be ready to revert if this causes significant regressions in important scenarios.

src/Servers/Kestrel/Transport.Sockets/src/Internal/IOQueue.cs

halter73 · 2024-06-28T20:07:33Z

@sebastienros I can merge this if you don't have any objections.

kouvel · 2024-06-28T22:24:48Z

However, poor HTTP/1.1 pipelining performance can be indicative of poor HTTP/2 performance since HTTP/2 can effectively pipeline 100 requests at a time with the default stream limit of 100. It would be nice to get some HTTP/2 numbers before changing this.

~~I see that the https variant of plaintext still has pipeline: 16 here: https://github.com/aspnet/Benchmarks/blob/9c454b7357ac0b032c48535df236b8bbb9588052/scenarios/plaintext.benchmarks.yml#L62~~

~~Whereas the https variant of json does not: https://github.com/aspnet/Benchmarks/blob/9c454b7357ac0b032c48535df236b8bbb9588052/scenarios/json.benchmarks.yml#L60-L72~~

~~What's the difference there, and is that kind of pipelining different and interesting?~~

[Edit] Nevermind, I mistook one thing for another, these are not http/2 benchmarks

kouvel · 2024-06-28T22:35:43Z

What would be a good benchmark to run to measure HTTP/2 multiplexing?

halter73 · 2024-06-28T23:07:12Z

One of the gRPC scenarios would be good. Make sure to set the stream count to something like 70 or 100. I think the default is 1.

kouvel · 2024-07-02T00:57:19Z

Grpc - This is running on the Ampere machine as server (80 procs) and citrine-amd as load (48 proc) with --scenario grpcaspnetcoreserver-grpcnetclient --variable scenario=serverstreaming --variable requestSize=0 --variable responseSize=0 --variable streams=70 --variable connections=28 --variable threads=28:

Maybe a slight regression, but a high error margin too and maybe not too significant. The test also appears to be more load-heavy than app-heavy and I'm not sure how representative it is.

Increase the default number of IO queues on larger machines

31afef5

The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.

kouvel added this to the 9.0.0 milestone Jun 28, 2024

kouvel self-assigned this Jun 28, 2024

kouvel requested review from halter73, BrennanConroy, JamesNK and mgravell as code owners June 28, 2024 06:53

ghost added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Jun 28, 2024

gfoidl reviewed Jun 28, 2024

View reviewed changes

halter73 approved these changes Jun 28, 2024

View reviewed changes

src/Servers/Kestrel/Transport.Sockets/src/Internal/IOQueue.cs Outdated Show resolved Hide resolved

Update comment, simplify calculation

a007656

halter73 merged commit 8046091 into dotnet:main Jul 1, 2024
26 checks passed

dotnet-policy-service bot modified the milestones: 9.0.0, 9.0-preview7 Jul 1, 2024

kouvel deleted the IoQueueLinFix branch July 2, 2024 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase the default number of IO queues on larger machines #56501

Increase the default number of IO queues on larger machines #56501

Uh oh!

kouvel commented Jun 28, 2024

Uh oh!

kouvel commented Jun 28, 2024 •

edited

Loading

Uh oh!

gfoidl Jun 28, 2024

Uh oh!

kouvel Jun 28, 2024

Uh oh!

danmoseley Jun 29, 2024

Uh oh!

gfoidl Jun 29, 2024

Uh oh!

Uh oh!

kouvel commented Jun 28, 2024

Uh oh!

halter73 left a comment

Uh oh!

Uh oh!

halter73 commented Jun 28, 2024

Uh oh!

kouvel commented Jun 28, 2024 •

edited

Loading

Uh oh!

kouvel commented Jun 28, 2024

Uh oh!

halter73 commented Jun 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

kouvel commented Jul 2, 2024 •

edited

Loading

Uh oh!

Uh oh!

	int processorCount = Environment.ProcessorCount;
	var processorCount = Environment.ProcessorCount;

Increase the default number of IO queues on larger machines #56501

Increase the default number of IO queues on larger machines #56501

Uh oh!

Conversation

kouvel commented Jun 28, 2024

Uh oh!

kouvel commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfoidl Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

kouvel Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

danmoseley Jun 29, 2024

Choose a reason for hiding this comment

Uh oh!

gfoidl Jun 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kouvel commented Jun 28, 2024

Uh oh!

halter73 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

halter73 commented Jun 28, 2024

Uh oh!

kouvel commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kouvel commented Jun 28, 2024

Uh oh!

halter73 commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kouvel commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kouvel commented Jun 28, 2024 •

edited

Loading

kouvel commented Jun 28, 2024 •

edited

Loading

halter73 commented Jun 28, 2024 •

edited

Loading

kouvel commented Jul 2, 2024 •

edited

Loading