-
Notifications
You must be signed in to change notification settings - Fork 106
H2 related corrections and optimizations. #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The current HTTP/2 implementation and TODO issues must be revised according to the RFC 9113. |
|
|
Probably we should not do this. I tested Firefox, Chome and Edge browsers using https://www.webpagetest.org/ and only Firefox do not use However, this is not the end of story. Here is a waterfall emulation for Chrome (also for our website): and here is HTTP/2 streams dependency graph for the same run: Streams 5 (jquery.min.js) and 7 (tempesta_technologies_logo.png) (rows 3 and 4 in the waterfall graph) share bandwidth, which can be observed from both the dependency and the waterfall graphs. They should not share bandwidth. And the waterfall graph also reports both the streams as The late images downloading is not crucial, but the first resources before reaching LCP are very crucial and people (e.g. me) do spend a lot of time for LCP optimizations. It could make sense to download JS code before the image to let it execute, while the image is being downloaded. On the frontend side I can specify prefetch priority as This requires more research and the results must be published in the wiki. |
I add debug messages into Tempesta FW source code in several places (when we create stream, when we make frames, when we change stream dependency according priority frames. The I run WebPageTest and look how chromium creates and prioritizes streams for out site. First of all chromium creates streams with id from 1 to 35 each new stream depends from previous one and we have such priority tree: 1->3->5->7->9->11->13->15->17->19->21->23->25->27->29->31->33->35. Then after sending all data for this streams chromium creates streams 37 which depends from 35, 39 which depends from 37, 41 which depends from 33, 43->41, 45->43, so our tree is 27->29->31->33->41->43->45->35->37->39 (we delete streams from 1 to 25 since we send all its data). Then chromium creates streams 47->39, 49->47, 49->51, 51->53, 53->55 and them immediately changes streams dependencies. 49->39 and 47->0, so priority tree will be |
|
I explored youtube.com, ebay.com, facebook.com and images.google.com and none of them use progressive JPEG. Some sources mention that the hyperscallers use various optimization techniques and I may observe no progressive JPEGs due to various factors, e.g. good connection. But the point is that we never see the case, where we actually make a decision on a resource weight (except Firefox). So why to have ebtrees? Since we have linked lists of dependencies, we make decision on dependency solely, not weight. Aren't simple linked list good enough instead of ebrees? Also I don't think we should invent a page with tens of AJAX resources - we need something real for core optimizations. UPDATE My concern is that in
|
We already have dependency tree, because each stream contains |
On the call we agreed to
|
As a result (w/o debug printing) we have much better timing for smaller number of streams
|
|
In our talk we discussed that both Firefox and Chrome moved to 9218, so there is no reason for us to maintain 7540. I propose just to remove the streams prioritization and replace it with 9218. This way we'll maintain smaller code base, probably will have code bit faster and will remove the borowed ebtree code. It's pity to remove the well engineered code though :( |
According information from ChatGPT huffman encoding gives compression benefit about 20 - 60 %. According the article from the task #1411 huffman gives benefit about 20 %. We decide to allocate extra 50 % space during huffan decoding to prevent extra allocations (make only one allocation).
According information from ChatGPT huffman encoding gives compression benefit about 20 - 60 %. According the article from the task #1411 huffman gives benefit about 20 %. We decide to allocate extra 50 % space during huffan decoding to prevent extra allocations (make only one allocation).
According information from ChatGPT huffman encoding gives compression benefit about 20 - 60 %. According the article from the task #1411 huffman gives benefit about 20 %. We decide to allocate extra 50 % space during huffan decoding to prevent extra allocations (make only one allocation).
According information from ChatGPT huffman encoding gives compression benefit about 20 - 60 %. According the article from the task #1411 huffman gives benefit about 20 %. We decide to allocate extra 50 % space during huffan decoding to prevent extra allocations (make only one allocation).
According information from ChatGPT huffman encoding gives compression benefit about 20 - 60 %. According the article from the task #1411 huffman gives benefit about 20 %. We decide to allocate extra 50 % space during huffan decoding to prevent extra allocations (make only one allocation).
Uh oh!
There was an error while loading. Please reload this page.
Bugs
The header 'age' is added unconditionally here; thus, looks like it should be skipped in
tfw_cache_copy_resp()
;MekhanikEvgenii - will be fixed in Fix double adding 'age' header. #2398
data = tfw_pool_alloc_not_align(it->pool, sz) should be moved down - after the static index processing;
MekhanikEvgenii - Fixed by Alloc memory in
tfw_hpack_hdr_name_set
only for dynamic indexing #2350Header 'host' can have empty field-value here (this possibility is mentioned in RFC 7230 section 5.4); (Outdated code. Host header can't be empty for http2. http2_general.test_h2_headers contains
test_empty_host_header
that verifies this.)Due to the nature of the internal implementation of encoder dynamic table in TempestaFW (for performance purposes), its maximum size is limited - it cannot be greater than HPACK_ENC_TABLE_MAX_SIZE; so for now, if client sends SETTINGS frame with new max window size greater than HPACK_ENC_TABLE_MAX_SIZE, the table inconsistency between endpoints may arise; to avoid this problem - the
WinodwUpdate
HEADERS frame with HPACK_ENC_TABLE_MAX_SIZE should be sent to client's decoder (within the first following HEADERS frame of the following SETTINGS acknowledgment, see RFC 7541 section 4.2); This is SETTINGS_HEADER_TABLE_SIZE > 4096 bytes does not work as expected #1807The condition here should be
MekhanikEvgenii - Fixed by #2351
Duplicated
cookie
headers should be concatenated before forwarding to HTTP/1.1 connection into one singular header (RFC 7540 section 8.1.2 p. 5 "Compressing the Cookie Header Field"); Forwarding http2 splitted cookie header for #1736It seems like a bug here, since the function is exiting in any case and the remainder of the
eolen
in the next fragment is not deleted; besides,tfw_http_msg_del_eol()
function is completely unused for now, as well as skb_next_data(), thus looks like these procedures can be removed;MekhanikEvgenii - This function is already removed.
(TBD) For now TempestaFW closes connection in case of any errors during parsing the message, but in case of HTTP/2 connection the malformed messages should be treated as a stream error and not the connection-wide error, which means that connection should not be closed - only stream (RFC7540 section 8.1.2 p. 6 "Malformed Requests and Responses");
MekhanikEvgenii - We decide that we can't do it because client add all sent headers to it's hpack dynamic table. When we can't parse message, we don't save headers in hpack dynamic table, moreover we discard remaining message data after error occurs! So we can't to serve new requests from such client.
Looks like assignment should bemoved to TDBv0.2: Cache background revalidation and eviction #515tfw_h2_stream_id_close()
calls are close to each other (e.g intfw_h2_resp_adjust_fwd()
procedure) andstream_id
can be saved locally between calls; but this way is not suitable for the cases wheretfw_h2_stream_id_close()
calls are rather far apart - the example of such case is failed redirection response generating: iftfw_h2_stream_id_close()
had been called in tfw_h2_prep_redirect(), but then the error occurred before the response is actually created/sent - the tfw_h2_send_resp() will not send error response to the client, sincereq->stream
is already NULL; to resolve the problem in general way, thetfw_h2_stream_id_close()
function should be changed a little, to assignid
to the special new internal field ofreq->pit
(instead of returning it as now) and check it during subsequent processing (i.e. if the error response should be generated after failed forwarding response, or failed redirection response generating) - before callingtfw_h2_stream_id_close()
; if this field is set (non-zero) thentfw_h2_stream_id_close()
had already been called and it is needn't to be called again - just use theid
saved in the field; if field is not set (zero) - thentfw_h2_stream_id_close()
should be called; this field need not be synchronized, since it can be accessed only from one thread (unlikely thereq->stream
field) in which current request is processed now (during forwarding request to server, or forwarding corresponding response to client); this approach should allow to completely avoid the problem with multiple calling totfw_h2_stream_id_close()
during internal responses generation (including failed redirection response generating) and during server response forwarding.MekhanikEvgenii - This problem was fixed in several different PRs. First of all this part of code was reworked during moving making frames to xmit callback and during moving stream processing (send part) to the xmit callback also. Finally problem was fixed in da24b86
Optimizations
It seems that just one call of
tfw_http_hdr_split()
procedure can be used here instead oftfw_http_msg_srvhdr_val()
call andtfw_h2_msg_hdr_length()
call below (i.e. for now @h_val and @s_val are two variables for the same instance);MekhanikEvgenii - Will be fixed by Remove unnecessary
tfw_http_hdr_split
#2368. It seems that we can removetfw_http_hdr_split
instead oftfw_http_msg_srvhdr_val
In case of response forwarding (TFW_H2_TRANS_INPLACE type of H2 encoding operations) the
tfw_http_hdr_split()
procedure is called twice for the same header: here and here; maybe it is worth to call this procedure only once, but earlier - in the beginning of @tfw_hpack_encode() function, before thetfw_hpack_encoder_index()
procedure call. See also Rewrite tfw_hpack_node_compare to make it clean & fast #1920 (comment) and Rewrite tfw_hpack_node_compare to make it clean & fast #1920 (comment)MekhanikEvgenii - fixed in Call tfw_http_hdr_split only during hpack encoding #2367
In case of Huffman-encoded header name and/or value - it is worth to allocate (and assign to
it->rspace
) a bit more size of memory here; e.g. as mentioned in Exploring HTTP/2 Header Compression section 3.2 - Huffman can compress by 20% on average, which means that allocation about 125% of initial Huffman-encoded size (i.e.res_len = len + len/4
) will help to avoid in most cases additional space allocations andTfwStr
expansions intfw_hpack_huffman_write()
procedure (note, that in this case the warnings forit->rspace
here and in other similar places should be removed);MekhanikEvgenii - will be fixed in Mekhanik evgenii/1411 7 #2403
If
tfw_hpack_exp_hdr()
function is called here, then the intersection withit->parsed_hdr
descriptor allocations inreq->pool
is occurred, and the following parsing of the header's value will require the full re-allocations inreq->pool
(instead of fast path) intfw_pool_realloc()
procedure; this problem can be avoided either via the another pool addition (specially forit->hdr
descriptors), or via the re-initialization ofit->hdr
descriptor; in the latter variant theit->hdr
will be reinitialized in any case here - with new values ofit->pos
andlength
(like inBUFFER_NAME_OPEN()
macro), and old descriptors(s) init->hdr
for header's name will be just forgotten (as they are not needed any more, since the name is parsed by now, and all necessary information is already init->parsed_hdr
); note however, that in this variant the intersection withit->parsed_hdr
descriptor allocations inreq->pool
can still take place ifit->hdr
became compound during Huffman decoding of header's value, but considering changes proposed in p.3 above - the case oftfw_hpack_exp_hdr()
calling fromtfw_hpack_huffman_write()
should be very rare, thus in most casesit->hdr
should remain plain and intersections will not occur;For now TempestaFW does not know static indexes for headers which are defined in configuration file for adding, substituting or appending, but this headers are fully known on configuration stage, so it is worth try to determine static indexes for them; looks like two ways are possible: either to allow the specification of static indexes for corresponding headers directly in the configuration file, or to implement something like tag search in Exploring HTTP/2 Header Compression section 3.1 (in part devoted to headers defined in static table) - on configuration stage. Done in Fixed framing error for resposnses received from cache with header modification #1831.
Since tfw_h2_stream_id() function is called only from request receive path, and there must not be concurrent access to
req->stream
until request will be passed to the server connection - locking intfw_h2_stream_id()
can be removed;MekhanikEvgenii - there is a PR Mekhanik evgenii/1411 3 #2365, which should fix this problem. In fact this point is not correct: when cache is not shard we don't need to call it at all, because stream_id can't be 0 here. If cache is shard there can be concurrent access to
req->stream
, because cache processing can be called on other cpu. But there is no sence to check stream_id also, because if stream can be closed in any time! For example ifstream_id
is not equal to zero beforetfw_cache_build_resp
there is no guarantee that it will not became zero during this function call! We check thatreq->stream
is not NULL during callingtfw_h2_stream_init_for_xmit
when we accessreq->stream
pointer.Perhaps it would be wise to create the version of tfw_h2_stream_id_close() procedure for cases when it is called from request receive path (e.g. from request processing in cache - here and here), since there must not be concurrent access to
req->stream
, and it must be valid (non-NULL) until request will be passed to the server connection.MekhanikEvgenii - there is a PR Mekhanik evgenii/1411 3 #2365, which should fix this problem. In fact this point is not correct: when cache is not shard we don't need to call it at all, because stream_id can't be 0 here. If cache is shard there can be concurrent access to
req->stream
, because cache processing can be called on other cpu. But there is no sence to check stream_id also, because if stream can be closed in any time! For example ifstream_id
is not equal to zero beforetfw_cache_build_resp
there is no guarantee that it will not became zero during this function call! We check thatreq->stream
is not NULL during callingtfw_h2_stream_init_for_xmit
when we accessreq->stream
pointer.The text was updated successfully, but these errors were encountered: