-
Notifications
You must be signed in to change notification settings - Fork 107
Description
HTTP Load balancing
Separated from #76, in particular from #76 (comment) . A faster implementation of HTTP field matching is required for HTTP load balancing and filtering. There could be a hash table, such that we can make a quick jump by a rule key and the key can be calculate by the string and ID of the HTTP field. And/or BNDM with q-Grams (BG) algorithm can be used to quickly process many strings with common prefix.
Issue #76 works on massive number of backend servers:
srv_group group_0 { server 127.0.0.1:9090 conns_n=1; }
srv_group group_1 { server 127.0.0.1:9090 conns_n=1; }
srv_group group_2 { server 127.0.0.1:9090 conns_n=1; }
....
srv_group group_999 { server 127.0.0.1:9090 conns_n=1; }
sched_http_rules {
match group_0 hdr_host eq "group-0.com";
match group_1 hdr_host eq "group-1.com";
match group_2 hdr_host eq "group-2.com";
....
match group_999 hdr_host eq "group-999.com";
}
Currently all 1000 and more match
rules are matched sequentially. The example is quite realistic for massive hosting installations. BG algorithm implemented in #901 must be applied to the matching. Probably matching syntax should be adjusted like (with #731 in mind):
host == {
"group-0.com" -> group_0;
"group-1.com" -> group_1;
"group-2.com" -> group_2;
}
HTTPtables
Strings matching
Also the use case from #731 must be processed in more efficient way, e.g. using hash table or a tree:
http_chain {
mark == {
2 -> backend_0;
3 -> backend_1;
4 -> backend_2;
5 -> backend_3;
....
}
}
Memory spacial locality
At the moment kzalloc()
is used on configuration phase a lot, so spacial locality on run time can be improved by using more local data structures.
The chains
Currently HTTPtables sequentially scans all the rules in a chain, which isn't efficient. The first option is to run only one per-header match using multi-pattern matching. Probably, there are also other optimization opportunities.
We need some use cases on large chains to understand the typical workload, i.e. whether there are cases with many patterns for the same headers or there are mostly different headers matchers.
Generic strings matching
Actually, Tempesta FW is full of multiple strings matching. E.g. caching policy for content type suffix is performed with FOR loop in tfw_capolicy_match()
while a powerfull web resource can have a lot of various suffixes: aif, aiff, au, avi, bin, bmp, cab, carb, cct, cdf, class, css, doc, dcr, dtd, gcf, gff, gif, grv, hdml, hqx, ico, ini, jpeg, jpg, js, mov, mp3, nc, pct, ppc, pws, swa, swf, txt, vbs, w32, wav, wbmp, wml, wmlc, wmls, wmlsc, xsd, zip.
Testing
Functional tests
TBD
Performance
We need a solid estimation on which number of rules and/or chains the performance significantly degrades.