Skip to content

H2 related corrections and optimizations. #1411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
15 of 17 tasks
aleksostapenko opened this issue May 21, 2020 · 12 comments
Open
15 of 17 tasks

H2 related corrections and optimizations. #1411

aleksostapenko opened this issue May 21, 2020 · 12 comments

Comments

@aleksostapenko
Copy link
Contributor

aleksostapenko commented May 21, 2020

Bugs

if (hclosed_streams->num <= TFW_MAX_CLOSED_STREAMS
   && (max_streams > ctx->streams_num
      || !hclosed_streams->num))
{
    . . .
}

MekhanikEvgenii - Fixed by #2351

  • Duplicated cookie headers should be concatenated before forwarding to HTTP/1.1 connection into one singular header (RFC 7540 section 8.1.2 p. 5 "Compressing the Cookie Header Field"); Forwarding http2 splitted cookie header for #1736

  • It seems like a bug here, since the function is exiting in any case and the remainder of the eolen in the next fragment is not deleted; besides, tfw_http_msg_del_eol() function is completely unused for now, as well as skb_next_data(), thus looks like these procedures can be removed;
    MekhanikEvgenii - This function is already removed.

  • (TBD) For now TempestaFW closes connection in case of any errors during parsing the message, but in case of HTTP/2 connection the malformed messages should be treated as a stream error and not the connection-wide error, which means that connection should not be closed - only stream (RFC7540 section 8.1.2 p. 6 "Malformed Requests and Responses");
    MekhanikEvgenii - We decide that we can't do it because client add all sent headers to it's hpack dynamic table. When we can't parse message, we don't save headers in hpack dynamic table, moreover we discard remaining message data after error occurs! So we can't to serve new requests from such client.

  • Looks like assignment should be moved to TDBv0.2: Cache background revalidation and eviction #515

(*r)->len = curr_ptr - (*r)->data
  • Currently, the stream pointer is reset to avoid double calling of stream FSM if error occurred during response generation; this solution works for cases where possible tfw_h2_stream_id_close() calls are close to each other (e.g in tfw_h2_resp_adjust_fwd() procedure) and stream_id can be saved locally between calls; but this way is not suitable for the cases where tfw_h2_stream_id_close() calls are rather far apart - the example of such case is failed redirection response generating: if tfw_h2_stream_id_close() had been called in tfw_h2_prep_redirect(), but then the error occurred before the response is actually created/sent - the tfw_h2_send_resp() will not send error response to the client, since req->stream is already NULL; to resolve the problem in general way, the tfw_h2_stream_id_close() function should be changed a little, to assign id to the special new internal field of req->pit (instead of returning it as now) and check it during subsequent processing (i.e. if the error response should be generated after failed forwarding response, or failed redirection response generating) - before calling tfw_h2_stream_id_close(); if this field is set (non-zero) then tfw_h2_stream_id_close() had already been called and it is needn't to be called again - just use the id saved in the field; if field is not set (zero) - then tfw_h2_stream_id_close() should be called; this field need not be synchronized, since it can be accessed only from one thread (unlikely the req->stream field) in which current request is processed now (during forwarding request to server, or forwarding corresponding response to client); this approach should allow to completely avoid the problem with multiple calling to tfw_h2_stream_id_close() during internal responses generation (including failed redirection response generating) and during server response forwarding.
    MekhanikEvgenii - This problem was fixed in several different PRs. First of all this part of code was reworked during moving making frames to xmit callback and during moving stream processing (send part) to the xmit callback also. Finally problem was fixed in da24b86

Optimizations

  • It seems that just one call of tfw_http_hdr_split() procedure can be used here instead of tfw_http_msg_srvhdr_val() call and tfw_h2_msg_hdr_length() call below (i.e. for now @h_val and @s_val are two variables for the same instance);
    MekhanikEvgenii - Will be fixed by Remove unnecessary tfw_http_hdr_split #2368. It seems that we can remove tfw_http_hdr_split instead of tfw_http_msg_srvhdr_val

  • In case of response forwarding (TFW_H2_TRANS_INPLACE type of H2 encoding operations) the tfw_http_hdr_split() procedure is called twice for the same header: here and here; maybe it is worth to call this procedure only once, but earlier - in the beginning of @tfw_hpack_encode() function, before the tfw_hpack_encoder_index() procedure call. See also Rewrite tfw_hpack_node_compare to make it clean & fast #1920 (comment) and Rewrite tfw_hpack_node_compare to make it clean & fast #1920 (comment)
    MekhanikEvgenii - fixed in Call tfw_http_hdr_split only during hpack encoding  #2367

  • In case of Huffman-encoded header name and/or value - it is worth to allocate (and assign to it->rspace) a bit more size of memory here; e.g. as mentioned in Exploring HTTP/2 Header Compression section 3.2 - Huffman can compress by 20% on average, which means that allocation about 125% of initial Huffman-encoded size (i.e. res_len = len + len/4) will help to avoid in most cases additional space allocations and TfwStr expansions in tfw_hpack_huffman_write() procedure (note, that in this case the warnings for it->rspace here and in other similar places should be removed);
    MekhanikEvgenii - will be fixed in Mekhanik evgenii/1411 7 #2403

  • If tfw_hpack_exp_hdr() function is called here, then the intersection with it->parsed_hdr descriptor allocations in req->pool is occurred, and the following parsing of the header's value will require the full re-allocations in req->pool (instead of fast path) in tfw_pool_realloc() procedure; this problem can be avoided either via the another pool addition (specially for it->hdr descriptors), or via the re-initialization of it->hdr descriptor; in the latter variant the it->hdr will be reinitialized in any case here - with new values of it->pos and length (like in BUFFER_NAME_OPEN() macro), and old descriptors(s) in it->hdr for header's name will be just forgotten (as they are not needed any more, since the name is parsed by now, and all necessary information is already in it->parsed_hdr); note however, that in this variant the intersection with it->parsed_hdr descriptor allocations in req->pool can still take place if it->hdr became compound during Huffman decoding of header's value, but considering changes proposed in p.3 above - the case of tfw_hpack_exp_hdr() calling from tfw_hpack_huffman_write() should be very rare, thus in most cases it->hdr should remain plain and intersections will not occur;

  • For now TempestaFW does not know static indexes for headers which are defined in configuration file for adding, substituting or appending, but this headers are fully known on configuration stage, so it is worth try to determine static indexes for them; looks like two ways are possible: either to allow the specification of static indexes for corresponding headers directly in the configuration file, or to implement something like tag search in Exploring HTTP/2 Header Compression section 3.1 (in part devoted to headers defined in static table) - on configuration stage. Done in Fixed framing error for resposnses received from cache with header modification #1831.

  • Since tfw_h2_stream_id() function is called only from request receive path, and there must not be concurrent access to req->stream until request will be passed to the server connection - locking in tfw_h2_stream_id() can be removed;
    MekhanikEvgenii - there is a PR Mekhanik evgenii/1411 3 #2365, which should fix this problem. In fact this point is not correct: when cache is not shard we don't need to call it at all, because stream_id can't be 0 here. If cache is shard there can be concurrent access to req->stream, because cache processing can be called on other cpu. But there is no sence to check stream_id also, because if stream can be closed in any time! For example if stream_id is not equal to zero before tfw_cache_build_resp there is no guarantee that it will not became zero during this function call! We check that req->stream is not NULL during calling tfw_h2_stream_init_for_xmit when we access req->stream pointer.

  • Perhaps it would be wise to create the version of tfw_h2_stream_id_close() procedure for cases when it is called from request receive path (e.g. from request processing in cache - here and here), since there must not be concurrent access to req->stream, and it must be valid (non-NULL) until request will be passed to the server connection.
    MekhanikEvgenii - there is a PR Mekhanik evgenii/1411 3 #2365, which should fix this problem. In fact this point is not correct: when cache is not shard we don't need to call it at all, because stream_id can't be 0 here. If cache is shard there can be concurrent access to req->stream, because cache processing can be called on other cpu. But there is no sence to check stream_id also, because if stream can be closed in any time! For example if stream_id is not equal to zero before tfw_cache_build_resp there is no guarantee that it will not became zero during this function call! We check that req->stream is not NULL during calling tfw_h2_stream_init_for_xmit when we access req->stream pointer.

@krizhanovsky
Copy link
Contributor

krizhanovsky commented May 27, 2020

The current HTTP/2 implementation and TODO issues must be revised according to the RFC 9113.

@const-t
Copy link
Contributor

const-t commented Oct 28, 2022

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Jun 3, 2024

  • During making HTTP2 frames we transform skb each time, when we insert frame header. May be it will be a good idea to allocate a page, use it and get() it for all inserted headers. Or reserve some more room in the allocated skb pages. This is probably an unlikely case, see discussion Mekhanik evgenii/1196 itog 2 #1973 (comment) .
  • Implement optiomization for non-progressive resources https://tempesta-tech.com/knowledge-base/HTTP2-streams-prioritization/#tempesta-fw We should not share bandwidth for same-weight sibling streams. See discussions at the below - we need 9218, which is exactly for non-progressive resources.
  • Introduce lag (see EEVDF CPU scheduler), which adjusts deficit by the difference between ideal and real numbers of sent bytes. (Basically, a network scheduler should provide fair number of bytes sent for each of the streams (see e.g. https://en.wikipedia.org/wiki/Deficit_round_robin). However, context switch between streams also has an overhead (e.g. it involves at least stream reinsertion in the ebtree), so it's better to be fair in terms of number of frames - sent the number of bytes for a selected stream as the minimum between allowed frame size and the bytes to be sent for the stream.). Agreed to not do in H2 related corrections and optimizations. #1411 (comment)
  • Implement good performance test in real conditions. At least use Chrome Lighthouse to test performance.
  • Each of the tree traversal (and we have a lot of them on each frame making!) we touch several pages, stressing the CPU caches. This must be fixed. Maybe we can (1) take advantage of the fact that the streams are (mostly) per-CPU (since we have TCP connection locality) and (2) not only streams can be places in the same memory area, that could be Client accounting, TLS, hpack and other data structures to improve the overall memory locality. Done in Rework allocation for stream schedulers. #2160

@krizhanovsky
Copy link
Contributor

Implement optiomization for non-progressive resources https://tempesta-tech.com/knowledge-base/HTTP2-streams-prioritization/#tempesta-fw We should not share bandwidth for same-weight sibling streams.

Probably we should not do this. I tested Firefox, Chome and Edge browsers using https://www.webpagetest.org/ and only Firefox do not use EXCLUSIVE flag for non progressive resources (see the screenshot for Edge at the below). It's known that Opera also uses EXCLUSIVE. So this optimization would be made only for progressive JPEG for Firefox, which is too narrow case for an optimization.

image

However, this is not the end of story. Here is a waterfall emulation for Chrome (also for our website):

image

and here is HTTP/2 streams dependency graph for the same run:

image

Streams 5 (jquery.min.js) and 7 (tempesta_technologies_logo.png) (rows 3 and 4 in the waterfall graph) share bandwidth, which can be observed from both the dependency and the waterfall graphs. They should not share bandwidth. And the waterfall graph also reports both the streams as EXCLUSIVE. Also it can be observed that the image streams are actually overlap in downloading, so it seems they do share resources.

The late images downloading is not crucial, but the first resources before reaching LCP are very crucial and people (e.g. me) do spend a lot of time for LCP optimizations. It could make sense to download JS code before the image to let it execute, while the image is being downloaded.

On the frontend side I can specify prefetch priority as low, high or unspecified. But I can not define HTTP/2 streams priority, so there is a very limited capability to specify the exact order of resources downloading.

This requires more research and the results must be published in the wiki.

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Jun 17, 2024

I add debug messages into Tempesta FW source code in several places (when we create stream, when we make frames, when we change stream dependency according priority frames. The I run WebPageTest and look how chromium creates and prioritizes streams for out site. First of all chromium creates streams with id from 1 to 35 each new stream depends from previous one and we have such priority tree: 1->3->5->7->9->11->13->15->17->19->21->23->25->27->29->31->33->35. Then after sending all data for this streams chromium creates streams 37 which depends from 35, 39 which depends from 37, 41 which depends from 33, 43->41, 45->43, so our tree is 27->29->31->33->41->43->45->35->37->39 (we delete streams from 1 to 25 since we send all its data). Then chromium creates streams 47->39, 49->47, 49->51, 51->53, 53->55 and them immediately changes streams dependencies. 49->39 and 47->0, so priority tree will be
47->41->43->45->37->39->49->51->53->55, then chromium creates streams from 57->55, 59->61, 61->63, 65->63, 67->65. Later chromium creates new streams, but all time we have priority tree, which looks like a list. So streams created by chromium never shares bandwidth. You can also look it in WebPageTest waterfall if you open request details and summarize request start time and time to first byte.

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Jun 18, 2024

  • During exploring how chromium creates and prioritizes streams it was found that chromium always does it with exclusive flag. That's why for chromium priority tree is a list of streams. It is a problem for our scheduler because we go from level to level in scheduler find active stream. For the least priority stream we have O(N) . So we should remove stream from the priority tree after sending all data if this stream is only one on this level.

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Jul 9, 2024

I explored youtube.com, ebay.com, facebook.com and images.google.com and none of them use progressive JPEG. Some sources mention that the hyperscallers use various optimization techniques and I may observe no progressive JPEGs due to various factors, e.g. good connection. But the point is that we never see the case, where we actually make a decision on a resource weight (except Firefox). So why to have ebtrees? Since we have linked lists of dependencies, we make decision on dependency solely, not weight. Aren't simple linked list good enough instead of ebrees? Also I don't think we should invent a page with tens of AJAX resources - we need something real for core optimizations.

UPDATE My concern is that in tfw_h2_sched_stream_dequeue() with the loop and active streams checking inside we essentially traverse the dependency tree. Let's consider the waterfall for our website https://www.webpagetest.org/result/240629_BiDcN0_5Z0/2/details/ and https://www.webpagetest.org/http2_dependencies.php?test=240629_BiDcN0_5Z0&run=2 : stream 1 depends on 0 (root), 3 depends on 1, 5 depends on 3. All of them have weight 256. In this function we might get any of the streams, say stream 3, which is blocked on stream 1. So we go up in the dependency tree to fetch the stream 1, which is active. Next, we remove it from the ebtree and reinsert it back with updated deficit, so it might be put somewhere deeply because streams 3 and 5 didn't yet send anything and according to the deficit accounting we should send on them. The takeaways are:

  1. ebtree on all the streams has no sense, since we can make progress only on a very small number of streams at the moment. With the EXCLUSIVE flag used for each stream, on one stream in particular.
  2. we should schedule on a dependency tree, probably with some levels in ebtree or simply linked lists like for Firefox case. The tree is reconstructed on streams reprioritization and should keep the most prioritized stream (or a tree of streams) supposed to be sent next.

@EvgeniiMekhanik
Copy link
Contributor

We already have dependency tree, because each stream contains TfwStreamSchedEntry which contains
struct eb_root active and struct eb_root blocked; which are the root of blocked and active childs! For chromium we have eb_tree with only one element on each level! So in your example we place stream 1 on the same place

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Jul 11, 2024

On the call we agreed to

  • make an optimization to not to recompute deficit and reinsert a stream if it is exclusive.

  • schedule and send streams in full based on weight w/o streams reinsertion by default and on a specified new per-vhost option distribute bandwidth by deficit. This seems SETTINGS_NO_RFC7540_PRIORITIES and RFC 9218 streams prioritization #2171

  • since the 7540 streams scheduling is only required for progressive JPEGs, let's not spend resources more on the EEVDF-like lags recomputations See the update at the below - nghttp2 handles log (penalty) in a good way

  • Dependency Cycle Attack (CVE-2015-8659) prevention - @EvgeniiMekhanik do we have check dpendency on itself, but how about the cycle? Remember any item in a loop?

  • UPDATE One more question/task from the WFQ algorithm review. H2O seems uses inverted version of the DRR algorithm (maybe it has its own name though, but I didn't find it). The implementation suffers from the slow and inaccurate division operation and also is prone to overflow. I think we should use the original DRR, where "deficit" means real deficit and we choose the stream with the maximum deficit to provide fairness.

  • UPDATE Please also explore and limit the minimum size of TLS records and HTTP frames. If we have a request to send a lot of data, but we have only like 10% of MSS available, then postpone transition. If the requested transmission fits the available space - send it normally.

@krizhanovsky
Copy link
Contributor

h2load is just a benchmarking tool, but it does use non-exclusive streams sharing the network bandwidth (using #2162, pay attention on stream 11):

[  427.452305] [tempesta fw]       Add new stream dependency: stream (id 1 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.456395] [tempesta fw]       Add new stream dependency: stream (id 3 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.461308] [tempesta fw]       Add new stream dependency: stream (id 5 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.467074] [tempesta fw]       Add new stream dependency: stream (id 7 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.474366] [tempesta fw]       Add new stream dependency: stream (id 9 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.490346] [tempesta fw]       Add new stream dependency: stream (id 11 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.500529] [tempesta fw]       Add new stream dependency: stream (id 13 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.510074] [tempesta fw]       Add new stream dependency: stream (id 15 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.519623] [tempesta fw]       Add new stream dependency: stream (id 17 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.530787] [tempesta fw]       Add new stream dependency: stream (id 19 deficit 4096 excl 0) depends from stream with id 0, ctx ffff88815500b5d8
[  427.922484] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  427.927867] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.935859] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.942510] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.948161] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.956394] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.966524] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.973643] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.979752] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.987469] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  427.994062] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.000928] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.012869] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.020356] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.027767] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.032977] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.040012] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.046210] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.055159] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.062547] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.068415] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.073641] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.081857] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.081970] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  428.082177] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  428.088541] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.104916] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.116399] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  428.116508] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.122353] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  428.128941] [tempesta fw]        Stream with id (7) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.138977] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.148734] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[  428.154125] [tempesta fw]        Stream with id (19) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.161283] [tempesta fw]        Stream with id (15) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.165517] [tempesta fw]        Stream with id (7) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.169972] [tempesta fw]        Stream with id (19) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.174175] [tempesta fw]        Stream with id (15) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  428.181487] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
.....
[  429.019928] [tempesta fw]        Stream with id (9) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.022839] [tempesta fw]        Stream with id (13) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.025755] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.028449] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.032650] [tempesta fw]        Stream with id (7) removed from dependency tree for making frames, ctx ffff88815500b5d8
....
[  429.065219] [tempesta fw]        Stream with id (13) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.068462] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.071297] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
....
[  429.099330] [tempesta fw]        Stream with id (13) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.103279] [tempesta fw]        Stream with id (11) removed from dependency tree for making frames, ctx ffff88815500b5d8
[  429.105973] [tempesta fw]        Stream with id (17) removed from dependency tree for making frames, ctx ffff88815500b5d8
...

As a result (w/o debug printing) we have much better timing for smaller number of streams

h2load -n 1000 -c 1 -t 1 -m 10 https://tempesta-tech.com
starting benchmark...
spawning thread #0: 1 total client(s). 1000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2

finished in 900.00ms, 33.33 req/s, 9.29MB/s
requests: 1000 total, 40 started, 30 done, 30 succeeded, 970 failed, 970 errored, 0 timeout
status codes: 30 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 8.36MB (8762507) total, 728B (728) headers (space savings 94.58%), 8.35MB (8755770) data
                     min         max         mean         sd        +/- sd
time for request:   190.24ms    299.48ms    237.41ms     35.62ms    60.00%
time for connect:     4.88ms      4.88ms      4.88ms         0us   100.00%
time to 1st byte:   199.67ms    199.67ms    199.67ms         0us   100.00%
req/s           :      33.34       33.34       33.34        0.00   100.00%
h2load -n 1000 -c 1 -t 1 -m 100 https://tempesta-tech.com
starting benchmark...
spawning thread #0: 1 total client(s). 1000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 1.47s, 5.43 req/s, 1.51MB/s
requests: 1000 total, 1000 started, 908 done, 8 succeeded, 992 failed, 992 errored, 0 timeout
status codes: 8 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 2.23MB (2339125) total, 486B (486) headers (space savings 86.44%), 2.22MB (2324800) data
                     min         max         mean         sd        +/- sd
time for request:      1.28s       1.43s       1.36s     53.59ms    62.50%
time for connect:     5.02ms      5.02ms      5.02ms         0us   100.00%
time to 1st byte:      1.29s       1.29s       1.29s         0us   100.00%
req/s           :       5.43        5.43        5.43        0.00   100.00%

@const-t
Copy link
Contributor

const-t commented Nov 11, 2024

if (unlikely((r = ...))
   goto err;

@krizhanovsky
Copy link
Contributor

In our talk we discussed that both Firefox and Chrome moved to 9218, so there is no reason for us to maintain 7540. I propose just to remove the streams prioritization and replace it with 9218. This way we'll maintain smaller code base, probably will have code bit faster and will remove the borowed ebtree code. It's pity to remove the well engineered code though :(

@krizhanovsky krizhanovsky modified the milestones: 1.0 - GA, 0.9 - LA Mar 3, 2025
EvgeniiMekhanik added a commit that referenced this issue Apr 15, 2025
According information from ChatGPT huffman encoding
gives compression benefit about 20 - 60 %. According
the article from the task #1411 huffman gives benefit
about 20 %. We decide to allocate extra 50 % space
during huffan decoding to prevent extra allocations
(make only one allocation).
EvgeniiMekhanik added a commit that referenced this issue Apr 17, 2025
According information from ChatGPT huffman encoding
gives compression benefit about 20 - 60 %. According
the article from the task #1411 huffman gives benefit
about 20 %. We decide to allocate extra 50 % space
during huffan decoding to prevent extra allocations
(make only one allocation).
EvgeniiMekhanik added a commit that referenced this issue Apr 18, 2025
According information from ChatGPT huffman encoding
gives compression benefit about 20 - 60 %. According
the article from the task #1411 huffman gives benefit
about 20 %. We decide to allocate extra 50 % space
during huffan decoding to prevent extra allocations
(make only one allocation).
EvgeniiMekhanik added a commit that referenced this issue Apr 18, 2025
According information from ChatGPT huffman encoding
gives compression benefit about 20 - 60 %. According
the article from the task #1411 huffman gives benefit
about 20 %. We decide to allocate extra 50 % space
during huffan decoding to prevent extra allocations
(make only one allocation).
EvgeniiMekhanik added a commit that referenced this issue May 2, 2025
According information from ChatGPT huffman encoding
gives compression benefit about 20 - 60 %. According
the article from the task #1411 huffman gives benefit
about 20 %. We decide to allocate extra 50 % space
during huffan decoding to prevent extra allocations
(make only one allocation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants