Replies: 1 comment
-
Here my answer to the ML Thank you for the insights. How about to take a look into #1570 to enhance the LB Algorithms? I personally use almost always Leastconn because I like the ideas to balance between all be servers. But my use-case is quite simple and I have only 2/3 servers normally. As I refer always to this excellent Blog Post Test driving “power of two random choices” load balancing maybe an update to this test with the current HAProxy Version could be interesting if something was changed? I refer to this block at the end.
From my point of view How about to create an alias like "power-of-two" Algo setting so that the people out there got the idea why "random(2)" isn't that bad in HAProxy case. Jm2c. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem statement
In 3.2, a lot of work was done to reduce the locking cost in the LB algorithms, making roundrobin and leastconn way more performant than they were before, particularly on machines with many cores and multiple core complexes like EPYC CPUs where leastconn used to be really bad in the past.
However, this work raised a question that's still valid: why do we still keep
roundrobin
as the default algorithm, except for historical reasons ? In many of our tests we're usingrandom(2)
akarandom
, because it's much better in that it's not purely random, it also considers the server's load (it picks the server with the least outstanding request between two random ones, thus stepping a little bit onleastconn's
feet). This is something thatroundrobin
cannot consider, by definition.I've run a few tests to compare
roundrobin
,leastconn
andrandom
. The tests were run on an AMD EPYC 64-core/128-thread system, connected to one load generator connected over 25Gbps and 4 servers (25 Gbps each). The tests consisted in retrieving a 1 kB object from the servers.The request rate in requests/s, the bandwidth in bits/s, and the CPU usage of the LB algorithm / locking are reported. The performance ratio between the algo and
roundrobin
are reported for each algo (measured in req/s).First test, using 2 thread groups (64 threads each). It can correspond to systems started with just
thread-groups 2
without other NUMA-specific directives, and may illustrate to some extents how it would behave on unified cache CPUs (e.g. intel) with the limitation that locking costs are generally lower on such systems:We can see that in this case the locking of the RR algo was still quite dominant, resulting in less overall performance.
Second test, enabling
cpu-policy performance
, which starts 8 thread groups of 16 threads each:Here the LB algo has the lowest locking cost thanks to the fact that there are in fact 8 independent round-robin schedulers running.
Third test, with one of the 4 server's connection was degraded from 25 Gbps to 1 Gbps.
This last test, even if cheated by degrading one server's connection, perfectly illustrates what happens when a server's processing capacity is not stable over time. It might be taking varying loads, or dealing with unequal requests, or simply experiencing bursts of garbage collector activity that slows it down for a few hundreds of milliseconds every few seconds.
roundrobin
isn't optimal in this case as it will methodically assign the request to these servers. Leastconn is optimal in dealing with such cases but it's more suited to long-lasting connections, and will not fairly distribute short-request traffic among servers weighted differently. For example, if some servers have a slowstart parameter, it will progressively raise their weight when they're added, but this will only affect their ability to accept new concurrent requests/connections, but not directly their share of the traffic.And here
random
excels. By default it will make two draws (the number of draws is the argument and defaults to 2), and compare the server's current number of connections, to pick the least loaded one. This way if a server experiences a temporary slowdown, it will keep its current requests longer and have a higher count than others, making it less eligible to new requests under the random algorithm.Proposal
Currently, the default load balancing algorithm is
roundrobin
when not specified. The proposal here in order to improve the situation would be in 3.3 to simply switch torandom
when not specified, the point being that if users do not specific the algorithm they want to use, it's that they're fine with the ones chosen for them by the product's designers. This one gives a better distribution on irregular farms and uses less CPU across varying hardware.Any objection / support ?
Note: also posted here on the mailing list.
Beta Was this translation helpful? Give feedback.
All reactions