Skip to content

Speed up Erlang 500x by using nodelay and system_time #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

baryluk
Copy link

@baryluk baryluk commented Nov 24, 2018

gen_tcp by default uses heavy TCP socket buffering in user space and kernel space and sends packets with a delay.

Disable that and send available buffer immediately, by setting {nodelay, true} that sets kernel socket option TCP_NODELAY. A lot of other Erlang web servers and non-Erlang ones enables this by default on tcp sockets or on HTTP server sockets. It is not enable by default in gen_tcp in Erlang, because it is not enabled by default in OS sockets like Linux, and it is just following bad practice of not changing OS defaults.

Additional use modern replacement for os:timestamp, and use erlang:system_time. It shouldn't have any significant impact on performance.

And do minor style changes in the code to follow Erlang standard coding style.

I am able to process 170k-200k requests per seconds on my machine. YES. 200000 requests per second.

With 1 thread and 1 connection from wrk, I am getting 13000-15000 requests/s and 66.5us average latency and 500us max latency.

Example benchmark run on my machine. I added one more tests with 500 concurrent connections at the end to show scalability.

user@debian:~/hit-server-bench/erlang$ ../do-benchmark.sh http://127.0.0.1:8080/
$ wrk -t 1 -c 1 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    77.12us    3.21us 305.00us   78.25%
    Req/Sec    12.87k   124.68    13.12k    82.18%
  129349 requests in 10.10s, 14.06MB read
Requests/sec:  12807.37
Transfer/sec:      1.39MB
$ wrk -t 2 -c 10 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.93us   17.23us   1.42ms   74.62%
    Req/Sec    43.46k   754.90    44.69k    75.25%
  873702 requests in 10.10s, 94.99MB read
Requests/sec:  86504.58
Transfer/sec:      9.40MB
$ wrk -t 10 -c 50 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  10 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   309.18us  195.32us  15.87ms   93.68%
    Req/Sec    15.51k   845.19    18.84k    70.37%
  1556896 requests in 10.10s, 169.26MB read
Requests/sec: 154155.92
Transfer/sec:     16.76MB
$ wrk -t 20 -c 100 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   638.73us  584.60us  18.11ms   92.79%
    Req/Sec     8.50k     1.25k   36.16k    91.41%
  1695472 requests in 10.10s, 184.33MB read
Requests/sec: 167867.90
Transfer/sec:     18.25MB
$ wrk -t 32 -c 500 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.99ms    2.60ms 206.81ms   89.92%
    Req/Sec     5.16k     1.04k   15.73k    86.45%
  1648791 requests in 10.08s, 179.26MB read
Requests/sec: 163524.01
Transfer/sec:     17.78MB
user@debian:~/hit-server-bench/erlang$ 

Requests per second

Environment Req/s (c=1) Req/s (c=10) Req/s (c=50) Req/s (c=100)
Node.js 10215.47 38447.10 51362.60 52722.19
Erlang 24.70 246.98 1240.38 2474.18
Erlang (fixed) 12481.72 81996.85 152795.69 168411.95

500x throughput improvement.

Average latency

Environment Latency (c=1) Latency (c=10) Latency (c=50) Latency (c=100)
Node.js 83.59us 253.57us 1.03ms 2.19ms
Erlang 40.60ms 40.58ms 40.35ms 40.43ms
Erlang (fixed) 79.85us 117.24us 328.21us 649.47us

500x latency improvement.

I do not want to brag, but Erlang is the fastest in the entire table, maybe with exception of Netty. Obviously I have no exact knowledge what machine was used for the tests. And the initial bad result should rise more suspicion and not be published in the first place.

No compiler or Erlang runtime changes or emulator flags/options.

Quick way to run it: erlc erlanghttp.erl && erl -noshell -s erlanghttp start.

It is also important that the nodelay option has basically no effect on big responses, as big responses (either chunked / streamed response, or data via sendfile, etc), will fill up buffers that no delay will be created in the first place. Only on small responses are severely affected.

gen_tcp by default uses heavy tcp socket buffering in user space and kernel space and sends packets with a delay.

Disable that and send available buffer immediately.

Additional use modern replacement for os:timestamp, and use erlang:system_time. It shouldn't have any significant impact on performance.

I am able to process 170k-200k requests per seconds on my machine. YES. 200000 requests per second.

With 1 thread and 1 connection from `wrk`, I am getting 13000-15000 requests/s and 66.5us average latency and 500us max latency.

Example benchmark run on my machine. I added one more tests with 500 concurrent connections at the end to show scalability.

```
user@debian:~/hit-server-bench/erlang$ ../do-benchmark.sh http://127.0.0.1:8080/
$ wrk -t 1 -c 1 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    77.12us    3.21us 305.00us   78.25%
    Req/Sec    12.87k   124.68    13.12k    82.18%
  129349 requests in 10.10s, 14.06MB read
Requests/sec:  12807.37
Transfer/sec:      1.39MB
$ wrk -t 2 -c 10 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.93us   17.23us   1.42ms   74.62%
    Req/Sec    43.46k   754.90    44.69k    75.25%
  873702 requests in 10.10s, 94.99MB read
Requests/sec:  86504.58
Transfer/sec:      9.40MB
$ wrk -t 10 -c 50 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  10 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   309.18us  195.32us  15.87ms   93.68%
    Req/Sec    15.51k   845.19    18.84k    70.37%
  1556896 requests in 10.10s, 169.26MB read
Requests/sec: 154155.92
Transfer/sec:     16.76MB
$ wrk -t 20 -c 100 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   638.73us  584.60us  18.11ms   92.79%
    Req/Sec     8.50k     1.25k   36.16k    91.41%
  1695472 requests in 10.10s, 184.33MB read
Requests/sec: 167867.90
Transfer/sec:     18.25MB
$ wrk -t 32 -c 500 -d 10 http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  32 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.99ms    2.60ms 206.81ms   89.92%
    Req/Sec     5.16k     1.04k   15.73k    86.45%
  1648791 requests in 10.08s, 179.26MB read
Requests/sec: 163524.01
Transfer/sec:     17.78MB
user@debian:~/hit-server-bench/erlang$ 
```

### Requests per second

| Environment          | Req/s (c=1) | Req/s (c=10) | Req/s (c=50) | Req/s (c=100) |
|----------------------|------------:|-------------:|-------------:|--------------:|
| Node.js              |    10215.47 |     38447.10 |     51362.60 |      52722.19 |
| Erlang               |       24.70 |       246.98 |      1240.38 |       2474.18 |
| Erlang (fixed) | 12481.72 | 81996.85 | 152795.69 | 168411.95 |

500x throughput improvement.

### Average latency

| Environment          | Latency (c=1) | Latency (c=10) | Latency (c=50) | Latency (c=100) |
|----------------------|--------------:|---------------:|---------------:|----------------:|
| Node.js              |       83.59us |       253.57us |         1.03ms |          2.19ms |
| Erlang               |       40.60ms |        40.58ms |        40.35ms |         40.43ms |
| Erlang (fixed) | 79.85us | 117.24us | 328.21us | 649.47us |

500x latency improvement.

I do not want to brag, but Erlang is the fastest in the entire table. Obviously I have no exact knowledge what machine was used for the tests. And the initial bad result should rise more suspicions.

No compiler or Erlang runtime changes or emulator flags/options.

Quick way to run it: `erlc erlanghttp.erl && erl -noshell -s erlanghttp start`.

It is also important that the nodelay option has basically no effect on big responses, as big responses (either chunked / streamed response, or data via sendfile, etc), will fill up buffers that no delay will be created in the first place. Only on small responses are severely affected.
@baryluk baryluk mentioned this pull request Nov 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant