-
Notifications
You must be signed in to change notification settings - Fork 75
BENCHMARK : Pipy 0.90 Multi Threads HTTP1.1
昨天flomesh发布了pipy 0.90版本,这个版本主要的变化是增加了多线程支持。详细的Release Note可以参考这里。
Pipy最初是作为sidecar proxy为目标进行设计和研发的,随着发展,pipy被使用在了越来越多的非sidecar proxy场景下。如0.50版本引入的MQTT协议支持,是为了满足用户对数亿IoT设备接入的需求。当前的0.90版本引入的多线程模式,是为了满足用户在高性能硬件平台上使用Pipy自建负载均衡器的需求。通过在高性能的白牌服务器上运行Pipy,可以实现媲美F5 BigIP等商业硬件产品的负载均衡能力,同时整体成本大幅降低。
Pipy多线程的实现基于asio的线程库,同时采用Linux内核的port reuse作为线程间的负载均衡。
这次的Benchmark测试里,我们主要关注pipy在多线程模式下随着线程数增加,pipy所能处理的HTTP1.1请求是否线性增长;以及在给定的硬件平台上,完成一百万RPS所需要的资源情况;以及在高负载基本HTTP1.1处理时候是否有明显的内存泄漏。
这次测试所采用的硬件,最开始我们选择了一台单处理Intel Xeon Gold 6144的服务器。这是2017年推出的当时高端处理器,具有8核心16线程,24M缓存。我们采购这种二手服务器做测试主要是成本原因。我们测试从pipy 一个线程、二个线程、四个线程、八个线程、十二个线程逐渐递加的模式。压测软件我们采用了wrk。在开始阶段,我们在同一台服务器运行pipy和wrk;但是在8个线程以上的测试中,我们需要在另外一台AMD Ryzen5 5600G台式机上运行wrk,pipy所在的Intel服务器和wrk所在的AMD台式机之间采用10G光纤连接。这次是最基础的HTTP1.1协议解析和网络IO处理的测试,因此对内存要求不高。Intel服务器配置了32G内存,AMD台式机配置了64G内存,但是实际测试用到的内存非常少。HTTP1.1是细节非常多的协议,是目前互联网上使用最广泛的协议。这次测试是基本的测试,pipy通过PipyJS直接返回“hi“,类似helloworld的测试;这次测试并不包含复杂的场景;更多的复杂场景可以以此为基础扩展。
运行pipy的Intel服务器:
[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6144 CPU @ 3.50GHz
Stepping: 4
CPU MHz: 3500.000
BogoMIPS: 7000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp_epp
[root@localhost ~]# free
total used free shared buff/cache available
Mem: 32253728 839788 22518024 17416 8895916 30993832
Swap: 16252924 0 16252924
在做12线程测试时候,运行wrk的AMD台式机:
root@pve8:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 80
Model name: AMD Ryzen 5 5600G with Radeon Graphics
Stepping: 0
Frequency boost: enabled
CPU MHz: 782.422
CPU max MHz: 4463.6709
CPU min MHz: 1400.0000
BogoMIPS: 7785.53
Virtualization: AMD-V
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 3 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-11
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccom
p
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitizati
on
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP a
lways-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe
1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_ap
icid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm ext
apic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skini
t wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx
cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgs
base bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap cl
flushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_oc
cup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru
wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean fl
ushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmlo
ad vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_
recov succor smca fsrm
root@pve8:~# free
total used free shared buff/cache available
Mem: 61615912 48749632 5226164 59712 7640116 12094084
Swap: 8388604 54016 8334588
在做12线程测试的时候,Intel服务器和AMD台式机之间采用10G光纤连接,他们之间的ping值是:
root@pve8:~# ping 10.10.6.1
PING 10.10.6.1 (10.10.6.1) 56(84) bytes of data.
64 bytes from 10.10.6.1: icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from 10.10.6.1: icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from 10.10.6.1: icmp_seq=3 ttl=64 time=0.094 ms
运行pipy的服务器采用CentOS 7.9版本:
[root@localhost conf]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
[root@localhost conf]# uname -a
Linux localhost.localdomain 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
在12线程测试案例中,运行wrk的AMD服务器采用Debian11.1版本:
root@pve8:~# cat /etc/debian_version
11.1
root@pve8:~# uname -a
Linux pve8 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100) x86_64 GNU/Linux
pipy是从GitHub release页面直接下载的,下载链接,运行的命令是:
pipy -e pipy().listen(8080).serveHTTP(new Message("hi")) --reuse-port --threads=12
在测试中,我们尝试了--threads=1、2、4、8、10、12,对应的我们wrk的线程数采用了1、2、4、6、10、12、16。其中pipy 8 thread情况下,wrk采用了6 thread,且和pipy运行在同一个主机上;pipy在8线程以上的情况下,我们在AMD台式机上运行了10线程的wrk。pipy和wrk版本如下:
[root@localhost conf]# pipy -v
Version : 0.90.0-18
Commit : d0ffc6f7613f8b6c4bf79461ea6b546eeb80b378
Commit Date : Thu, 26 Jan 2023 09:36:30 +0800
Host : Linux-5.15.0-1031-azure x86_64
OpenSSL : OpenSSL 1.1.1q 5 Jul 2022
Builtin GUI : No
Samples : No
[root@localhost conf]# wrk -v
wrk 4.2.0 [epoll] Copyright (C) 2012 Will Glozer
Usage: wrk <options> <url>
Options:
-c, --connections <N> Connections to keep open
-d, --duration <T> Duration of test
-t, --threads <N> Number of threads to use
-s, --script <S> Load Lua script file
-H, --header <H> Add header to request
--latency Print latency statistics
--timeout <T> Socket/request timeout
-v, --version Print version details
Numeric arguments may include a SI unit (1k, 1M, 1G)
Time arguments may include a time unit (2s, 2m, 2h)
在整个测试中,除了调大最大文件打开数,我们没有调整其他的内核参数。
[root@localhost conf]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 120116
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 120116
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
测试中我没有做numa绑定,在另外的测试中我们观察到numa绑定对性能有益,在特定的CPU上甚至能提高性能接近30%。
[root@localhost conf]# numastat
node0
numa_hit 46354014
numa_miss 0
numa_foreign 0
interleave_hit 30100
local_node 46354014
other_node 0
[root@localhost conf]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 32304 MB
node 0 free: 21986 MB
node distances:
node 0
0: 10
在测试过程中,我们对每个测试都执行3次以考察稳定性。
[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
1 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 718.77us 22.50us 2.31ms 97.36%
Req/Sec 139.51k 2.16k 143.51k 90.33%
Latency Distribution
50% 719.00us
75% 724.00us
90% 728.00us
99% 751.00us
4167073 requests in 30.00s, 254.34MB read
Requests/sec: 138888.90
Transfer/sec: 8.48MB
[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
1 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 708.90us 24.64us 2.28ms 97.74%
Req/Sec 141.43k 2.14k 145.43k 90.00%
Latency Distribution
50% 709.00us
75% 715.00us
90% 720.00us
99% 740.00us
4224342 requests in 30.00s, 257.83MB read
Requests/sec: 140794.98
Transfer/sec: 8.59MB
[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
1 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 722.16us 26.61us 2.28ms 97.41%
Req/Sec 138.90k 3.05k 144.70k 71.00%
Latency Distribution
50% 726.00us
75% 735.00us
90% 739.00us
99% 749.00us
4147754 requests in 30.00s, 253.16MB read
Requests/sec: 138245.50
Transfer/sec: 8.44MB
top - 23:39:57 up 1 day, 25 min, 3 users, load average: 0.82, 0.59, 0.35
Tasks: 249 total, 1 running, 248 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 6.3 us, 40.5 sy, 0.0 ni, 34.6 id, 0.0 wa, 0.0 hi, 18.5 si, 0.0 st
%Cpu4 : 42.1 us, 31.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 26.2 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32253728 total, 22542776 free, 813532 used, 8897420 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31020088 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93877 root 20 0 157160 7528 3060 S 99.7 0.0 1:10.63 pipy9
93892 root 20 0 187816 3852 1320 S 83.4 0.0 0:09.05 wrk
[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 713.81us 101.03us 22.77ms 95.89%
Req/Sec 140.50k 4.80k 152.62k 91.83%
Latency Distribution
50% 681.00us
75% 761.00us
90% 771.00us
99% 1.08ms
8391393 requests in 30.00s, 512.17MB read
Requests/sec: 279667.72
Transfer/sec: 17.07MB
[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 696.38us 35.87us 11.98ms 97.95%
Req/Sec 144.00k 3.42k 152.96k 95.85%
Latency Distribution
50% 695.00us
75% 702.00us
90% 708.00us
99% 739.00us
8627069 requests in 30.10s, 526.55MB read
Requests/sec: 286570.16
Transfer/sec: 17.49MB
[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 686.39us 45.86us 2.30ms 75.95%
Req/Sec 146.05k 2.40k 153.49k 92.19%
Latency Distribution
50% 697.00us
75% 727.00us
90% 733.00us
99% 749.00us
8751690 requests in 30.10s, 534.16MB read
Requests/sec: 290723.14
Transfer/sec: 17.74MB
top - 23:43:42 up 1 day, 29 min, 3 users, load average: 0.58, 0.47, 0.35
Threads: 270 total, 5 running, 265 sleeping, 0 stopped, 0 zombie
%Cpu0 : 42.4 us, 35.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 22.2 si, 0.0 st
%Cpu1 : 1.7 us, 7.6 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 3.8 si, 0.0 st
%Cpu2 : 4.0 us, 20.2 sy, 0.0 ni, 66.5 id, 0.0 wa, 0.0 hi, 9.2 si, 0.0 st
%Cpu3 : 1.3 us, 6.0 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 3.7 si, 0.0 st
%Cpu4 : 7.2 us, 36.9 sy, 0.0 ni, 37.3 id, 0.0 wa, 0.0 hi, 18.5 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 42.2 us, 33.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 24.3 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32253728 total, 22544080 free, 812548 used, 8897100 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31021072 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93900 root 20 0 230892 7060 3060 R 99.9 0.0 0:11.72 pipy9
93901 root 20 0 230892 7060 3060 R 99.7 0.0 0:11.71 pipy9
93906 root 20 0 262492 4376 1408 R 72.5 0.0 0:08.80 wrk
93905 root 20 0 262492 4376 1408 R 72.2 0.0 0:08.42 wrk
[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 706.09us 252.17us 47.47ms 99.02%
Req/Sec 142.48k 6.71k 157.61k 86.33%
Latency Distribution
50% 682.00us
75% 730.00us
90% 770.00us
99% 0.96ms
17015503 requests in 30.00s, 1.01GB read
Requests/sec: 567093.25
Transfer/sec: 34.61MB
[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 683.24us 58.77us 15.00ms 79.55%
Req/Sec 146.56k 6.43k 157.10k 77.08%
Latency Distribution
50% 679.00us
75% 714.00us
90% 741.00us
99% 797.00us
17563955 requests in 30.11s, 1.05GB read
Requests/sec: 583406.21
Transfer/sec: 35.61MB
[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 671.29us 67.94us 8.17ms 68.98%
Req/Sec 149.04k 7.32k 164.91k 73.59%
Latency Distribution
50% 678.00us
75% 725.00us
90% 763.00us
99% 792.00us
17861568 requests in 30.10s, 1.06GB read
Requests/sec: 593334.86
Transfer/sec: 36.21MB
top - 00:17:33 up 1 day, 1:03, 3 users, load average: 1.51, 0.37, 0.17
Threads: 274 total, 9 running, 265 sleeping, 0 stopped, 0 zombie
%Cpu0 : 5.4 us, 21.4 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.0 hi, 11.4 si, 0.0 st
%Cpu1 : 44.7 us, 33.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 22.0 si, 0.0 st
%Cpu2 : 42.9 us, 34.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 22.6 si, 0.0 st
%Cpu3 : 10.9 us, 38.3 sy, 0.0 ni, 30.5 id, 0.0 wa, 0.0 hi, 20.3 si, 0.0 st
%Cpu4 : 3.8 us, 14.1 sy, 0.0 ni, 75.5 id, 0.0 wa, 0.0 hi, 6.6 si, 0.0 st
%Cpu5 : 43.9 us, 34.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 21.6 si, 0.0 st
%Cpu6 : 42.7 us, 34.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 22.5 si, 0.0 st
%Cpu7 : 4.5 us, 16.3 sy, 0.0 ni, 72.3 id, 0.0 wa, 0.0 hi, 6.9 si, 0.0 st
%Cpu8 : 2.8 us, 12.6 sy, 0.0 ni, 78.6 id, 0.0 wa, 0.0 hi, 6.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 6.5 us, 27.7 sy, 0.0 ni, 51.8 id, 0.0 wa, 0.0 hi, 14.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 6.4 us, 27.0 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 13.8 si, 0.0 st
KiB Mem : 32253728 total, 22530596 free, 826040 used, 8897092 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31007580 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93937 root 20 0 378356 19384 3068 R 99.9 0.1 0:21.62 pipy9
93939 root 20 0 378356 19384 3068 R 99.9 0.1 0:21.62 pipy9
93938 root 20 0 378356 19384 3068 R 99.7 0.1 0:21.61 pipy9
93936 root 20 0 378356 19384 3068 R 99.3 0.1 0:21.61 pipy9
93944 root 20 0 412008 5684 1420 R 78.4 0.0 0:16.63 wrk
93945 root 20 0 412008 5684 1420 R 78.4 0.0 0:16.62 wrk
93946 root 20 0 412008 5684 1420 R 77.4 0.0 0:16.62 wrk
93943 root 20 0 412008 5684 1420 R 77.1 0.0 0:16.75 wrk
从8线程开始,我们增加从AMD台式机的测试,因为pipy的线程数加上wrk的线程数已经达到和超过Intel服务器总线程数量。
[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
6 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.90ms 213.88us 10.58ms 62.50%
Req/Sec 147.38k 6.66k 159.94k 64.50%
Latency Distribution
50% 0.93ms
75% 1.04ms
90% 1.17ms
99% 1.32ms
26394824 requests in 30.01s, 1.57GB read
Requests/sec: 879636.03
Transfer/sec: 53.69MB
[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
6 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.92ms 227.16us 15.83ms 60.63%
Req/Sec 144.33k 7.16k 160.47k 65.83%
Latency Distribution
50% 1.00ms
75% 1.09ms
90% 1.15ms
99% 1.23ms
25853073 requests in 30.01s, 1.54GB read
Requests/sec: 861582.11
Transfer/sec: 52.59MB
[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
6 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.90ms 214.27us 19.25ms 55.80%
Req/Sec 147.55k 6.99k 162.61k 75.17%
Latency Distribution
50% 0.96ms
75% 1.06ms
90% 1.15ms
99% 1.24ms
26428474 requests in 30.01s, 1.58GB read
Requests/sec: 880760.83
Transfer/sec: 53.76MB
top - 00:21:24 up 1 day, 1:07, 3 users, load average: 1.78, 1.05, 0.50
Threads: 280 total, 16 running, 264 sleeping, 0 stopped, 0 zombie
%Cpu0 : 7.0 us, 8.4 sy, 0.0 ni, 81.2 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
%Cpu1 : 19.6 us, 43.3 sy, 0.0 ni, 18.6 id, 0.0 wa, 0.0 hi, 18.6 si, 0.0 st
%Cpu2 : 19.0 us, 44.3 sy, 0.0 ni, 16.3 id, 0.0 wa, 0.0 hi, 20.4 si, 0.0 st
%Cpu3 : 38.3 us, 36.9 sy, 0.0 ni, 4.0 id, 0.0 wa, 0.0 hi, 20.8 si, 0.0 st
%Cpu4 : 27.2 us, 31.3 sy, 0.0 ni, 23.5 id, 0.0 wa, 0.0 hi, 18.0 si, 0.0 st
%Cpu5 : 15.0 us, 51.7 sy, 0.0 ni, 11.6 id, 0.0 wa, 0.0 hi, 21.8 si, 0.0 st
%Cpu6 : 13.2 us, 35.9 sy, 0.0 ni, 35.3 id, 0.0 wa, 0.0 hi, 15.6 si, 0.0 st
%Cpu7 : 19.2 us, 32.3 sy, 0.0 ni, 34.7 id, 0.0 wa, 0.0 hi, 13.8 si, 0.0 st
%Cpu8 : 49.7 us, 33.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.9 si, 0.0 st
%Cpu9 : 15.0 us, 51.9 sy, 0.0 ni, 9.8 id, 0.0 wa, 0.0 hi, 23.3 si, 0.0 st
%Cpu10 : 53.2 us, 29.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 17.6 si, 0.0 st
%Cpu11 : 44.6 us, 36.6 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 17.1 si, 0.0 st
%Cpu12 : 48.7 us, 32.3 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 18.7 si, 0.0 st
%Cpu13 : 36.7 us, 39.1 sy, 0.0 ni, 4.4 id, 0.0 wa, 0.0 hi, 19.9 si, 0.0 st
%Cpu14 : 50.5 us, 30.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 18.9 si, 0.0 st
%Cpu15 : 51.5 us, 32.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.3 si, 0.0 st
KiB Mem : 32253728 total, 22531304 free, 825596 used, 8896828 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31008024 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93961 root 20 0 673284 17168 3060 R 99.9 0.1 0:11.76 pipy9
93966 root 20 0 673284 17168 3060 R 99.9 0.1 0:11.76 pipy9
93967 root 20 0 673284 17168 3060 R 99.9 0.1 0:11.76 pipy9
93962 root 20 0 673284 17168 3060 R 99.7 0.1 0:11.75 pipy9
93963 root 20 0 673284 17168 3060 R 99.7 0.1 0:11.75 pipy9
93964 root 20 0 673284 17168 3060 R 99.7 0.1 0:11.75 pipy9
93965 root 20 0 673284 17168 3060 R 99.7 0.1 0:11.75 pipy9
93968 root 20 0 673284 17168 3060 R 99.7 0.1 0:11.75 pipy9
93973 root 20 0 561720 7988 1384 R 95.0 0.0 0:10.71 wrk
93976 root 20 0 561720 7988 1384 R 95.0 0.0 0:10.72 wrk
93972 root 20 0 561720 7988 1384 R 94.7 0.0 0:10.81 wrk
93974 root 20 0 561720 7988 1384 R 94.4 0.0 0:10.67 wrk
93975 root 20 0 561720 7988 1384 R 94.4 0.0 0:10.67 wrk
93971 root 20 0 561720 7988 1384 R 91.7 0.0 0:10.82 wrk
注意,从pipy 8线程开始,测试结果中开始出现超过100万RPS的结果。
root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
8 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.85ms 523.28us 52.97ms 91.47%
Req/Sec 116.81k 9.25k 173.89k 76.25%
Latency Distribution
50% 789.00us
75% 1.03ms
90% 1.30ms
99% 2.00ms
18598311 requests in 20.03s, 1.11GB read
Requests/sec: 928379.69
Transfer/sec: 56.66MB
root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
8 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 779.32us 515.44us 19.19ms 87.39%
Req/Sec 124.69k 9.15k 179.50k 69.52%
Latency Distribution
50% 639.00us
75% 0.99ms
90% 1.30ms
99% 1.76ms
19864558 requests in 20.10s, 1.18GB read
Requests/sec: 988290.11
Transfer/sec: 60.32MB
root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
8 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 752.40us 593.56us 46.36ms 91.47%
Req/Sec 130.36k 9.02k 164.74k 70.50%
Latency Distribution
50% 573.00us
75% 0.93ms
90% 1.18ms
99% 1.98ms
20756097 requests in 20.03s, 1.24GB read
Requests/sec: 1036149.19
Transfer/sec: 63.24MB
top - 00:32:01 up 1 day, 1:17, 3 users, load average: 1.64, 2.82, 1.86
Threads: 273 total, 9 running, 264 sleeping, 0 stopped, 0 zombie
%Cpu0 : 1.0 us, 1.0 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 37.3 us, 25.3 sy, 0.0 ni, 1.3 id, 0.0 wa, 0.0 hi, 36.0 si, 0.0 st
%Cpu2 : 58.7 us, 32.7 sy, 0.0 ni, 1.0 id, 0.0 wa, 0.0 hi, 7.7 si, 0.0 st
%Cpu3 : 52.5 us, 41.4 sy, 0.0 ni, 5.8 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu4 : 12.4 us, 10.7 sy, 0.0 ni, 74.5 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st
%Cpu5 : 43.5 us, 33.8 sy, 0.0 ni, 10.4 id, 0.0 wa, 0.0 hi, 12.4 si, 0.0 st
%Cpu6 : 2.3 us, 2.7 sy, 0.0 ni, 95.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 54.8 us, 30.2 sy, 0.0 ni, 2.7 id, 0.0 wa, 0.0 hi, 12.3 si, 0.0 st
%Cpu8 : 3.3 us, 2.9 sy, 0.0 ni, 92.6 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st
%Cpu9 : 54.8 us, 33.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 12.0 si, 0.0 st
%Cpu10 : 44.5 us, 25.8 sy, 0.0 ni, 4.7 id, 0.0 wa, 0.0 hi, 25.1 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 6.8 us, 5.2 sy, 0.0 ni, 85.5 id, 0.0 wa, 0.0 hi, 2.4 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 36.6 us, 23.1 sy, 0.0 ni, 19.0 id, 0.0 wa, 0.0 hi, 21.4 si, 0.0 st
KiB Mem : 32253728 total, 22513244 free, 843416 used, 8897068 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 30990204 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93963 root 20 0 673284 39556 3064 R 99.7 0.1 3:12.04 pipy9
93966 root 20 0 673284 39556 3064 R 99.7 0.1 3:12.06 pipy9
93961 root 20 0 673284 39556 3064 R 99.3 0.1 3:12.00 pipy9
93962 root 20 0 673284 39556 3064 R 99.3 0.1 3:12.03 pipy9
93964 root 20 0 673284 39556 3064 R 99.3 0.1 3:12.06 pipy9
93967 root 20 0 673284 39556 3064 R 99.3 0.1 3:12.01 pipy9
93968 root 20 0 673284 39556 3064 R 99.3 0.1 3:12.04 pipy9
93965 root 20 0 673284 39556 3064 R 99.0 0.1 3:12.04 pipy9
可以观察到,在pipy 10线程时候,可以稳定实现超过100万RPS的处理能力;甚至第三次测试达到116万RPS。
root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.01ms 1.06ms 64.50ms 96.78%
Req/Sec 106.25k 11.39k 143.87k 66.90%
Latency Distribution
50% 0.87ms
75% 1.10ms
90% 1.40ms
99% 5.57ms
21170346 requests in 20.08s, 1.26GB read
Requests/sec: 1054249.43
Transfer/sec: 64.35MB
root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.97ms 1.06ms 47.04ms 96.60%
Req/Sec 111.95k 10.86k 153.71k 70.25%
Latency Distribution
50% 812.00us
75% 1.04ms
90% 1.36ms
99% 5.98ms
22303769 requests in 20.08s, 1.33GB read
Requests/sec: 1110522.81
Transfer/sec: 67.78MB
root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.93ms 1.18ms 59.22ms 96.78%
Req/Sec 117.45k 12.51k 155.26k 68.40%
Latency Distribution
50% 766.00us
75% 1.02ms
90% 1.33ms
99% 6.58ms
23406622 requests in 20.08s, 1.40GB read
Requests/sec: 1165900.45
Transfer/sec: 71.16MB
top - 00:38:09 up 1 day, 1:24, 3 users, load average: 1.68, 1.80, 1.72
Tasks: 248 total, 1 running, 247 sleeping, 0 stopped, 0 zombie
%Cpu0 : 7.3 us, 5.0 sy, 0.0 ni, 85.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu1 : 35.6 us, 19.7 sy, 0.0 ni, 33.1 id, 0.0 wa, 0.0 hi, 11.6 si, 0.0 st
%Cpu2 : 33.0 us, 23.3 sy, 0.0 ni, 37.3 id, 0.0 wa, 0.0 hi, 6.3 si, 0.0 st
%Cpu3 : 41.6 us, 25.3 sy, 0.0 ni, 15.2 id, 0.0 wa, 0.0 hi, 17.9 si, 0.0 st
%Cpu4 : 13.1 us, 10.4 sy, 0.0 ni, 73.8 id, 0.0 wa, 0.0 hi, 2.7 si, 0.0 st
%Cpu5 : 29.9 us, 20.7 sy, 0.0 ni, 36.9 id, 0.0 wa, 0.0 hi, 12.5 si, 0.0 st
%Cpu6 : 8.3 us, 5.6 sy, 0.0 ni, 83.4 id, 0.0 wa, 0.0 hi, 2.7 si, 0.0 st
%Cpu7 : 53.8 us, 32.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 14.0 si, 0.0 st
%Cpu8 : 32.6 us, 26.7 sy, 0.0 ni, 34.4 id, 0.0 wa, 0.0 hi, 6.3 si, 0.0 st
%Cpu9 : 54.6 us, 31.5 sy, 0.0 ni, 0.7 id, 0.0 wa, 0.0 hi, 13.2 si, 0.0 st
%Cpu10 : 32.6 us, 21.9 sy, 0.0 ni, 41.6 id, 0.0 wa, 0.0 hi, 3.9 si, 0.0 st
%Cpu11 : 53.5 us, 32.2 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 14.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 38.3 us, 25.9 sy, 0.0 ni, 28.5 id, 0.0 wa, 0.0 hi, 7.3 si, 0.0 st
%Cpu14 : 52.2 us, 34.3 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 11.8 si, 0.0 st
%Cpu15 : 51.5 us, 32.6 sy, 0.0 ni, 1.0 id, 0.0 wa, 0.0 hi, 15.0 si, 0.0 st
KiB Mem : 32253728 total, 22524340 free, 831944 used, 8897444 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31001676 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
94022 root 20 0 820748 30468 3064 S 990.1 0.1 5:14.37 pipy9
可以观察到,在12线程时候,wrk的结果开始出现Socket errors,主要是因为运行wrk的AMD台式机只有12线程。同时可以观察到,在12线程时候,RPS可以稳定在120万以上。
root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
12 threads and 1200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.89ms 1.22ms 28.10ms 96.60%
Req/Sec 111.20k 32.79k 167.96k 89.23%
Latency Distribution
50% 722.00us
75% 0.88ms
90% 1.11ms
99% 7.20ms
24400093 requests in 20.10s, 1.45GB read
Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1214173.16
Transfer/sec: 74.11MB
root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
12 threads and 1200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.99ms 1.54ms 34.87ms 95.07%
Req/Sec 101.08k 26.68k 166.28k 65.23%
Latency Distribution
50% 703.00us
75% 0.90ms
90% 1.18ms
99% 9.06ms
24176741 requests in 20.09s, 1.44GB read
Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1203599.89
Transfer/sec: 73.46MB
root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
12 threads and 1200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.97ms 1.49ms 42.10ms 95.13%
Req/Sec 102.44k 27.64k 163.32k 74.67%
Latency Distribution
50% 697.00us
75% 0.88ms
90% 1.15ms
99% 9.04ms
24497260 requests in 20.10s, 1.46GB read
Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1219031.76
Transfer/sec: 74.40MB
top - 00:41:10 up 1 day, 1:27, 2 users, load average: 2.29, 1.91, 1.79
Tasks: 246 total, 1 running, 245 sleeping, 0 stopped, 0 zombie
%Cpu0 : 40.6 us, 21.5 sy, 0.0 ni, 26.8 id, 0.0 wa, 0.0 hi, 11.1 si, 0.0 st
%Cpu1 : 48.3 us, 31.3 sy, 0.0 ni, 3.7 id, 0.0 wa, 0.0 hi, 16.7 si, 0.0 st
%Cpu2 : 42.3 us, 26.6 sy, 0.0 ni, 25.9 id, 0.0 wa, 0.0 hi, 5.2 si, 0.0 st
%Cpu3 : 45.2 us, 31.0 sy, 0.0 ni, 10.2 id, 0.0 wa, 0.0 hi, 13.6 si, 0.0 st
%Cpu4 : 17.5 us, 12.7 sy, 0.0 ni, 64.6 id, 0.0 wa, 0.0 hi, 5.2 si, 0.0 st
%Cpu5 : 41.3 us, 27.4 sy, 0.0 ni, 18.4 id, 0.0 wa, 0.0 hi, 12.8 si, 0.0 st
%Cpu6 : 33.6 us, 17.4 sy, 0.0 ni, 40.9 id, 0.0 wa, 0.0 hi, 8.1 si, 0.0 st
%Cpu7 : 51.9 us, 29.6 sy, 0.0 ni, 6.1 id, 0.0 wa, 0.0 hi, 12.5 si, 0.0 st
%Cpu8 : 40.0 us, 24.6 sy, 0.0 ni, 19.3 id, 0.0 wa, 0.0 hi, 16.1 si, 0.0 st
%Cpu9 : 55.7 us, 30.0 sy, 0.0 ni, 1.0 id, 0.0 wa, 0.0 hi, 13.3 si, 0.0 st
%Cpu10 : 33.7 us, 20.4 sy, 0.0 ni, 36.9 id, 0.0 wa, 0.0 hi, 9.0 si, 0.0 st
%Cpu11 : 54.3 us, 30.5 sy, 0.0 ni, 2.0 id, 0.0 wa, 0.0 hi, 13.2 si, 0.0 st
%Cpu12 : 1.0 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu13 : 39.3 us, 27.9 sy, 0.0 ni, 24.5 id, 0.0 wa, 0.0 hi, 8.3 si, 0.0 st
%Cpu14 : 51.4 us, 31.4 sy, 0.0 ni, 5.1 id, 0.0 wa, 0.0 hi, 12.2 si, 0.0 st
%Cpu15 : 43.9 us, 28.7 sy, 0.0 ni, 12.2 id, 0.0 wa, 0.0 hi, 15.2 si, 0.0 st
KiB Mem : 32253728 total, 22529704 free, 826476 used, 8897548 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 31007172 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
94036 root 20 0 968212 26276 3060 S 1172 0.1 3:00.04 pipy9
最后我们把pipy的线程数设置成CPU最大线程数。从TC6我们知道,wrk所在的AMD主机已经接近性能极限,但是我们还是试下更极限的测试~结果是在wrk主机负载饱和时,pipy的吞吐量稳定在130万RPS之上。
root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 667.16us 0.91ms 48.94ms 98.03%
Req/Sec 133.92k 22.06k 178.08k 81.39%
Latency Distribution
50% 571.00us
75% 725.00us
90% 0.88ms
99% 4.01ms
26672324 requests in 20.09s, 1.59GB read
Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1327589.21
Transfer/sec: 81.03MB
root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 611.91us 733.13us 28.98ms 98.19%
Req/Sec 134.61k 28.27k 195.17k 84.54%
Latency Distribution
50% 498.00us
75% 721.00us
90% 0.92ms
99% 2.64ms
26798275 requests in 20.04s, 1.60GB read
Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1337142.66
Transfer/sec: 81.61MB
root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
10 threads and 1600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 671.95us 0.89ms 34.32ms 97.99%
Req/Sec 131.52k 27.67k 187.85k 78.93%
Latency Distribution
50% 572.00us
75% 741.00us
90% 0.89ms
99% 4.11ms
26185447 requests in 20.08s, 1.56GB read
Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1303802.77
Transfer/sec: 79.58MB
top - 00:52:04 up 1 day, 1:37, 2 users, load average: 6.93, 3.79, 2.74
Tasks: 247 total, 1 running, 246 sleeping, 0 stopped, 0 zombie
%Cpu0 : 52.2 us, 29.2 sy, 0.0 ni, 4.4 id, 0.0 wa, 0.0 hi, 14.2 si, 0.0 st
%Cpu1 : 46.0 us, 30.9 sy, 0.0 ni, 4.0 id, 0.0 wa, 0.0 hi, 19.1 si, 0.0 st
%Cpu2 : 46.6 us, 32.2 sy, 0.0 ni, 3.0 id, 0.0 wa, 0.0 hi, 18.1 si, 0.0 st
%Cpu3 : 46.3 us, 31.2 sy, 0.0 ni, 3.0 id, 0.0 wa, 0.0 hi, 19.5 si, 0.0 st
%Cpu4 : 48.1 us, 30.6 sy, 0.0 ni, 3.4 id, 0.0 wa, 0.0 hi, 17.8 si, 0.0 st
%Cpu5 : 47.7 us, 31.0 sy, 0.0 ni, 2.0 id, 0.0 wa, 0.0 hi, 19.3 si, 0.0 st
%Cpu6 : 53.2 us, 30.6 sy, 0.0 ni, 3.0 id, 0.0 wa, 0.0 hi, 13.1 si, 0.0 st
%Cpu7 : 51.4 us, 31.4 sy, 0.0 ni, 2.7 id, 0.0 wa, 0.0 hi, 14.5 si, 0.0 st
%Cpu8 : 45.8 us, 31.6 sy, 0.0 ni, 3.0 id, 0.0 wa, 0.0 hi, 19.5 si, 0.0 st
%Cpu9 : 54.4 us, 28.0 sy, 0.0 ni, 4.1 id, 0.0 wa, 0.0 hi, 13.5 si, 0.0 st
%Cpu10 : 47.0 us, 30.5 sy, 0.0 ni, 4.4 id, 0.0 wa, 0.0 hi, 18.1 si, 0.0 st
%Cpu11 : 50.9 us, 30.4 sy, 0.0 ni, 5.1 id, 0.0 wa, 0.0 hi, 13.7 si, 0.0 st
%Cpu12 : 52.0 us, 31.8 sy, 0.0 ni, 2.7 id, 0.0 wa, 0.0 hi, 13.5 si, 0.0 st
%Cpu13 : 47.0 us, 31.1 sy, 0.0 ni, 3.4 id, 0.0 wa, 0.0 hi, 18.6 si, 0.0 st
%Cpu14 : 52.5 us, 30.5 sy, 0.0 ni, 2.7 id, 0.0 wa, 0.0 hi, 14.2 si, 0.0 st
%Cpu15 : 45.8 us, 31.6 sy, 0.0 ni, 2.4 id, 0.0 wa, 0.0 hi, 20.2 si, 0.0 st
KiB Mem : 32253728 total, 22509992 free, 846128 used, 8897608 buff/cache
KiB Swap: 16252924 total, 16252924 free, 0 used. 30987524 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
94055 root 20 0 1263272 46092 3064 S 1531 0.1 18:29.62 pipy9
这是一个简单的HTTP1.1 benchmark测试,从测试结果我们可以看到如下结论:
- 随着pipy线程数的增加,吞吐量RPS接近线性的增长
- 在给定的Intel处理器环境下,pipy在8线程时候出现100万RPS的测试结果;并在10线程时候稳定在100万RPS以上
- 在整个测试过程中,wrk所记录的P50、P75、P90、P99的数据分布稳定,而且波动较小,几乎没有观测到长尾现象
- 在整个测试过程中,pipy所占用的内存稳定,基本可以判断在测试中无内存泄漏