Standalone HttpClient's source code #96938

HighPerfDotNet · 2024-01-13T11:41:03Z

HighPerfDotNet
Jan 13, 2024

I'd like to make some changes to HttpClient, specifically in regards to much more optimal handling of headers as even relatively recently addded NonValidated ones ( #53555 ) are still very bad when it comes to allocations related to headers in scenarios when all of them are needed to be handled (or just saved in bulk for processing later).

I was thinking to modify as part of whole thing, but all my attempts to build my own .NET 8 on Windows and Linux failed, and trying to extract relevant bits for HttpClient only also failed because it seems to drag endless amount of other files from framework. Putting them into different namespace for modification prevents access to some internals in their own namespace also.

In the past I think there was a standalone HttpClient source code that could have compiled separately, is that still the case or the whole thing is so tighly linked into new framework that it can't be improved on its own?

So far it feels like building your own version of framework is the only path forward, but it's less than ideal for sure, even if it worked perfectly.

Answered by jkotas

Jan 13, 2024

trying to extract relevant bits for HttpClient only also failed because it seems to drag endless amount of other files from framework

The repo structure is not optimized for extracting all sources for a single library easily. We share a lot of sources between different libraries as you have found.

In the past I think there was a standalone HttpClient source code that could have compiled separately, is that still the case

HttpClient sources were always part of a larger repo, with build system integrated with the rest of the repo.

it feels like building your own version of framework is the only path forward

It depends on how much work you want to do on your build system. You should …

View full answer

jkotas · 2024-01-13T14:59:06Z

jkotas
Jan 13, 2024
Collaborator

trying to extract relevant bits for HttpClient only also failed because it seems to drag endless amount of other files from framework

The repo structure is not optimized for extracting all sources for a single library easily. We share a lot of sources between different libraries as you have found.

In the past I think there was a standalone HttpClient source code that could have compiled separately, is that still the case

HttpClient sources were always part of a larger repo, with build system integrated with the rest of the repo.

it feels like building your own version of framework is the only path forward

It depends on how much work you want to do on your build system. You should be able to extract the required sources and build your version of HttpClient only. It is "just work".

0 replies

HighPerfDotNet · 2024-01-17T23:04:41Z

HighPerfDotNet
Jan 17, 2024
Author

We will step up and do what it takes, thank you.

3 replies

MihaZupan Jan 17, 2024
Collaborator

I'd like to make some changes to HttpClient, specifically in regards to much more optimal handling of headers as even relatively recently addded NonValidated ones ( #53555 ) are still very bad when it comes to allocations related to headers in scenarios when all of them are needed to be handled (or just saved in bulk for processing later).

What alternative did you have in mind?

Lots of strings involved here (most keys and some values) are cached. What's the relative cost of these allocations that you've measured in your scenario?

HighPerfDotNet Jan 18, 2024
Author

I need to get all headers and all their values saved in a file for post-processing later. This generates an unholy number of string allocations. The best option I found so far is using NonValidated and calling ToString() on it - this gets full list seeminly removing individual allocations, but this is still sadly as a big string. But at least I can parse without any allocations for a handful of headers that I need to know at the run time (such as location redirect) and save the rest. The problem is that sadly this creates fairly large string allocation as they get merged in an equivalent of a string builder, would have been far better if they could have been written into a re-usable byte[] array instead or into Stream object provided by the callee.

Perhaps a better solution would have been to keep all headers in byte[] buffer without creating any strings at all, return values only when specific headers are requested (so you don't need to create new header key then) - parse buffer on the fly, it's a very small list, quick scan to match header key on demand would be as quick as it is now.

Here is screen from VS 2022 profiler - this is after fetching around 2k URLs (plus doing some of my stuff but most that got optimised heavily now).

You can see there related headers problem with a lot of allocations when headers are getting sent during the request stage, lots of generic Lists allocated by HttpHeaders.AddValueToStoreValue. Async tasks also allocate a lot of object, and I'd say Http 4k buffer allocations seem to be rather high - 5k in total, and I've set keep alive to 10 seconds where as current test run is 60+ seconds, so one would imagine a lot of those headers would have been recycled, yet they seem to be about the same as number of Http requests made over that time.

All that leads to GC events every 7-8 seconds, this is the task that I expect to run for many hours, so would have liked to avoid wasteful allocations if necessary, had to change a lot of my own code to do it.

MihaZupan Jan 18, 2024
Collaborator

Thanks. Indeed, large values are likely to be the main culprit.

lots of generic Lists allocated by HttpHeaders.AddValueToStoreValue

Allocations like HeaderStoreItemInfo, StringWithQualityHeaderValue ... indicate that headers have been accessed with validation (either enumerated, or via strongly-typed properties).

If you strictly only use NonValidated, the only allocations you should see are HeaderEntry[], string and Lists (the latter only if there are multiple values with the same key sent separately by the server).

Http 4k buffer allocations seem to be rather high

If this is HTTP/1.1, you'll currently see this allocation once per connection, so negligible if you're reusing connections, but something we could improve (bullet nr. 1 here).

All that leads to GC events every 7-8 seconds

If you do end up going down the route of reimplementing such logic I'd be curious to see how much of an impact on time spent in GC you see from headers alone.

HighPerfDotNet · 2024-01-18T16:30:13Z

HighPerfDotNet
Jan 18, 2024
Author

I am forcing HTTP/1.1 - connection should be pooled, I've set these -

                            PooledConnectionLifetime=TimeSpan.FromSeconds(10),
                            PooledConnectionIdleTimeout=TimeSpan.FromSeconds(10),

So I was expecting buffers to be released after 10 seconds, I don't think this is happening over 50 seconds, which should be enough for some of them to give up and return buffer back. Even if they are hanging around why keep empty buffer? It should be returned back to pool, and re-requested if new data coming in. I understand that protocol might require some "pinging", but does that need full 4k buffer to be occupied all the time?

Also, ideally there should be a way for the user to indicate if a particular request connection does NOT need to be pooled - user may know that there will be only 1 connection to a given site, so pooling should not be used for that.

A lof of those headers added when the request is made -

oHttpClient.DefaultRequestHeaders.Add("Accept-Language",sAcceptLanguage);
oHttpClient.DefaultRequestHeaders.Add("Accept-Encoding",sAcceptEncoding);
oHttpClient.DefaultRequestHeaders.Add("User-Agent",sUserAgent);
oHttpClient.DefaultRequestHeaders.Add("Accept","*/*");

They are even default, yet this creates a lot of allocations, instead of creating Key/Value pairs they (in my opinion) should have been kept in byte[] - even when changed by user (how often does that happen?) for requests then it's cheap CPU wise to recompact whole buffer with updated value. Then write whole lot in request without messing with strings.

GC collections for the above test run -

My understanding is that they are happening mostly because of high allocation count leads GC to run cycle, lots of survivors - which is not suprising if Key/Value pairs get stuck in objects. GC settings -

      "System.GC.Server": false,
      "System.GC.Concurrent": true,
      "System.GC.CpuGroup": false,

Don't want to use Server for this, in any case it won't make those allocations disappear.

0 replies

HighPerfDotNet · 2024-01-18T16:49:28Z

HighPerfDotNet
Jan 18, 2024
Author

I've read your link, thank you, very interesting, but how to use that zero-byte reads? I am getting stream like this -

  using(HttpResponseMessage oRes = await oHttpClient.SendAsync(oRequest,HttpCompletionOption.ResponseHeadersRead,oToken))
  {
      bool bReadData=oMyCode.AsyncBeginGetResponseStream(this,oRes);

      if(bReadData)
      {
          if(oRes.Content is object)
          {
              Stream oStream = await oRes.Content.ReadAsStreamAsync();

             oMyCode.AsyncReadCallBack(this,oStream);
          }
      }
  }

I read Stream until bytes returned are 0.

Should I explictly ask to read 0 bytes into buffer after I receive no more bytes available, would that release internal buffers?

3 replies

MihaZupan Jan 18, 2024
Collaborator

The internal 4 kB buffers are not pooled. This is what I mentioned we could improve in the future.

Zero-byte reads are there to lower the point-in-time memory consumption of a process when you have lots of connections at the same time (especially if they're idle). They won't directly help with "bytes allocated per request".

HighPerfDotNet Jan 18, 2024
Author

Is there a particular reason why they are not pooled? Perhaps buffers could be provided by the user in constructor, surely that would be easy? Then the user can pool them if they feel like it.

I've tried zero byte read after receiving request, no obvious effect, will keep it for larger scale testing where maybe it will keep memory usage lower.

MihaZupan Jan 18, 2024
Collaborator

Is there a particular reason why they are not pooled? Perhaps buffers could be provided by the user in constructor, surely that would be easy? Then the user can pool them if they feel like it.

Just not implemented yet.
And these are all implementation details, we wouldn't expose them to the user.

HighPerfDotNet · 2024-01-18T16:59:46Z

HighPerfDotNet
Jan 18, 2024
Author

By the way I've migrated code from HttpWebRequest with its callbacks (not tasks), and I was unpleasably surprised to see a LOT of task allocations compared to previous solution, it's number 2 of top allocations in my case - IMHO good old async callbacks would have been FAR more preferrable to have, are they still available in HttpClient?

It would have been also VERY helpful to be able to provide DNS info to request - we always first do it to check some things, and even though DNS server will have it cached there is no sense to do it second time if this data is available. Perhaps callback to custom resolve DNS would have been possible?

1 reply

MihaZupan Jan 18, 2024
Collaborator

No, APM is horrible from a usability perspective compared to async/await, and we won't be exposing new APIs for it.

It would have been also VERY helpful to be able to provide DNS info to request

You can. You can either:

set the IP in the Uri you're requesting instead of the host and set the host header manually
provide a custom SocketsHttpHandler.ConnectCallback that avoids the DNS resolution

HighPerfDotNet · 2024-01-18T18:38:37Z

HighPerfDotNet
Jan 18, 2024
Author

Ah great about DNS, will give it a go, thanks!

0 replies

HighPerfDotNet · 2024-01-23T12:38:42Z

HighPerfDotNet
Jan 23, 2024
Author

My much smarter than me colleague got .NET source code compiling, so we'll shortly give it a go at modifications and report back.

In the meantime I've noticed one odd thing - I've setup HttpClient to accept HTTP/1.1 or lower as follows -

oHttpClient.DefaultRequestVersion=HttpVersion.Version11;
oHttpClient.DefaultVersionPolicy=HttpVersionPolicy.RequestVersionOrLower;

Connections when accepted indeed declare as HTTP/1.1 however I've noticed in profiler a lot of small byte[] allocations that come from HPack and QPack which I believe used in HTTP/2 and 3 - to me this seems to violate specified policy - HttpVersionPolicy.RequestVersionOrLower - this causes unnecessary small allocations, a LOT of them.

Is this intentional and how it's supposed to work?

Using these settings seems to stop this from happening -

  "System.Net.Http.SocketsHttpHandler.Http2Support" : false,
  "System.Net.Http.SocketsHttpHandler.Http3Support" : false,

I've stepped into HttpClient source and it seems to have used HTTP/1.1 correctly, so no idea why HPack/QPack allocations were happening, they seems to have been consistent with number of request being made.

0 replies

Standalone HttpClient's source code #96938

Uh oh!

HighPerfDotNet Jan 13, 2024

Replies: 7 comments · 7 replies

Uh oh!

jkotas Jan 13, 2024 Collaborator

Uh oh!

HighPerfDotNet Jan 17, 2024 Author

Uh oh!

MihaZupan Jan 17, 2024 Collaborator

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

MihaZupan Jan 18, 2024 Collaborator

Uh oh!

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

MihaZupan Jan 18, 2024 Collaborator

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

MihaZupan Jan 18, 2024 Collaborator

Uh oh!

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

Uh oh!

MihaZupan Jan 18, 2024 Collaborator

Uh oh!

HighPerfDotNet Jan 18, 2024 Author

Uh oh!

Uh oh!

HighPerfDotNet Jan 23, 2024 Author

HighPerfDotNet
Jan 13, 2024

Replies: 7 comments 7 replies

jkotas
Jan 13, 2024
Collaborator

HighPerfDotNet
Jan 17, 2024
Author

MihaZupan Jan 17, 2024
Collaborator

HighPerfDotNet Jan 18, 2024
Author

MihaZupan Jan 18, 2024
Collaborator

HighPerfDotNet
Jan 18, 2024
Author

HighPerfDotNet
Jan 18, 2024
Author

MihaZupan Jan 18, 2024
Collaborator

HighPerfDotNet Jan 18, 2024
Author

MihaZupan Jan 18, 2024
Collaborator

HighPerfDotNet
Jan 18, 2024
Author

MihaZupan Jan 18, 2024
Collaborator

HighPerfDotNet
Jan 18, 2024
Author

HighPerfDotNet
Jan 23, 2024
Author