-
Notifications
You must be signed in to change notification settings - Fork 137
fix: Add some crate features for performance #2477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Let's see if they do. Also, @mxinden, I was wondering why we went with a multi-threaded `tokio` client and server. I'm wondering if the thread-management overheads are worth it compared to using just the `rt` scheduler?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2477 +/- ##
=======================================
Coverage 94.91% 94.91%
=======================================
Files 115 115
Lines 34286 34286
Branches 34286 34286
=======================================
Hits 32543 32543
Misses 1734 1734
Partials 9 9
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 37c3aee. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to a341259. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.time: [198.58 ms 198.94 ms 199.30 ms] thrpt: [501.76 MiB/s 502.67 MiB/s 503.58 MiB/s] change: time: [−2.0523% −1.7659% −1.4888%] (p = 0.00 < 0.05) thrpt: [+1.5113% +1.7976% +2.0953%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [302.98 ms 304.36 ms 305.74 ms] thrpt: [32.708 Kelem/s 32.856 Kelem/s 33.006 Kelem/s] change: time: [−0.0873% +0.5531% +1.2132%] (p = 0.09 > 0.05) thrpt: [−1.1987% −0.5500% +0.0874%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [27.349 ms 27.439 ms 27.558 ms] thrpt: [36.287 B/s 36.444 B/s 36.564 B/s] change: time: [−0.7745% −0.2244% +0.3429%] (p = 0.45 > 0.05) thrpt: [−0.3417% +0.2249% +0.7806%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.time: [622.90 ms 627.59 ms 632.27 ms] thrpt: [158.16 MiB/s 159.34 MiB/s 160.54 MiB/s] change: time: [−5.0735% −4.1248% −3.2188%] (p = 0.00 < 0.05) thrpt: [+3.3259% +4.3022% +5.3446%] decode 4096 bytes, mask ff: Change within noise threshold.time: [11.629 µs 11.672 µs 11.721 µs] change: [−1.4598% −1.0726% −0.5453%] (p = 0.00 < 0.05) decode 1048576 bytes, mask ff: Change within noise threshold.time: [3.0583 ms 3.0679 ms 3.0805 ms] change: [+0.7403% +1.2257% +1.7377%] (p = 0.00 < 0.05) decode 4096 bytes, mask 7f: 💚 Performance has improved.time: [19.363 µs 19.446 µs 19.562 µs] change: [−4.0416% −3.0524% −2.3825%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 7f: Change within noise threshold.time: [5.0845 ms 5.0972 ms 5.1105 ms] change: [+0.4872% +0.8734% +1.2405%] (p = 0.00 < 0.05) decode 4096 bytes, mask 3f: 💚 Performance has improved.time: [5.5305 µs 5.5588 µs 5.5930 µs] change: [−33.157% −32.863% −32.534%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 3f: 💔 Performance has regressed.time: [1.7873 ms 1.7997 ms 1.8123 ms] change: [+12.244% +13.138% +13.972%] (p = 0.00 < 0.05) 1000 streams of 1 bytes/multistream: 💔 Performance has regressed.time: [47.764 ns 47.945 ns 48.127 ns] change: [+29.708% +31.271% +32.821%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.time: [47.002 ns 47.177 ns 47.353 ns] change: [+22.520% +23.830% +25.204%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [88.065 ns 88.418 ns 88.774 ns] change: [−0.4540% −0.0593% +0.3303%] (p = 0.77 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [105.56 ns 106.12 ns 106.83 ns] change: [−0.4586% +0.0028% +0.4900%] (p = 0.99 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [104.67 ns 105.03 ns 105.46 ns] change: [−0.3186% +0.1905% +0.9788%] (p = 0.63 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [88.603 ns 88.775 ns 88.973 ns] change: [−1.1413% −0.3373% +0.4524%] (p = 0.44 > 0.05) RxStreamOrderer::inbound_frame(): No change in performance detected.time: [107.89 ms 107.96 ms 108.03 ms] change: [−0.2639% −0.0098% +0.1676%] (p = 0.94 > 0.05) sent::Packets::take_ranges: No change in performance detected.time: [8.0893 µs 8.3207 µs 8.5443 µs] change: [−3.3996% +3.7280% +13.696%] (p = 0.52 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [37.216 ms 37.293 ms 37.371 ms] change: [+0.5064% +0.8545% +1.2038%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [37.913 ms 38.031 ms 38.155 ms] change: [+0.7331% +1.1975% +1.6586%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [36.575 ms 36.650 ms 36.735 ms] change: [−0.6683% −0.3588% −0.0384%] (p = 0.02 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [38.728 ms 38.817 ms 38.911 ms] change: [+1.6254% +1.9650% +2.2815%] (p = 0.00 < 0.05) Client/server transfer resultsPerformance differences relative to a341259. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
I chose multi-threaded as it is the de-facto default. No other reason.
👍 worth experimenting. Intuitively, given that it is a single future only, there is no cross-thread communication and thus no significant overhead. |
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
I am fine merging here. That said, I would prefer individual pull requests per feature, to ensure each change, and not just all changes as a whole, have a positive performance impact. In addition, I don't think we should merge here before we have reliable benchmarks, i.e. not merge here before #2657 is fixed. |
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
|
Branch | fix-features |
Testbed | t-linux64-ms-280 |
🚨 1 Alert
Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Upper Boundary (Limit %) |
---|---|---|---|---|
decode 1048576 bytes, mask ff | Latency milliseconds (ms) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 3.07 ms(+1.15%)Baseline: 3.04 ms | 3.07 ms (100.04%) |
Click to view all benchmark results
Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
---|---|---|---|
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 643,680,000.00 ns(-3.09%)Baseline: 664,191,369.86 ns | 732,128,729.16 ns (87.92%) |
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 636,170,000.00 ns(+0.77%)Baseline: 631,338,493.15 ns | 833,045,917.44 ns (76.37%) |
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 27,135,000.00 ns(-0.21%)Baseline: 27,193,041.10 ns | 27,664,766.42 ns (98.09%) |
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 303,130,000.00 ns(-0.56%)Baseline: 304,823,150.68 ns | 316,111,262.80 ns (95.89%) |
1000 streams of 1 bytes/multistream | 📈 view plot 🚷 view threshold | 35.80 ns(-4.67%)Baseline: 37.55 ns | 54.48 ns (65.71%) |
1000 streams of 1000 bytes/multistream | 📈 view plot 🚷 view threshold | 33.94 ns(-8.45%)Baseline: 37.07 ns | 54.08 ns (62.76%) |
RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 111,300,000.00 ns(+0.73%)Baseline: 110,495,356.16 ns | 114,591,968.70 ns (97.13%) |
coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.48 ns(-0.23%)Baseline: 88.69 ns | 89.30 ns (99.08%) |
coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 106.24 ns(+0.28%)Baseline: 105.94 ns | 106.89 ns (99.39%) |
coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 89.08 ns(-0.24%)Baseline: 89.29 ns | 91.62 ns (97.23%) |
coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 106.45 ns(-0.08%)Baseline: 106.53 ns | 107.40 ns (99.11%) |
decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,760,800.00 ns(+8.76%)Baseline: 1,618,995.89 ns | 1,772,698.80 ns (99.33%) |
decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,090,000.00 ns(+0.51%)Baseline: 5,064,032.88 ns | 5,092,127.33 ns (99.96%) |
decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 3,071,200.00 ns(+1.15%)Baseline: 3,036,191.78 ns | 3,069,876.22 ns (100.04%) |
decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 5,534.50 ns(-29.78%)Baseline: 7,881.99 ns | 10,226.42 ns (54.12%) |
decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 19,422.00 ns(-2.45%)Baseline: 19,910.25 ns | 20,417.73 ns (95.12%) |
decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,627.00 ns(-1.57%)Baseline: 11,812.75 ns | 11,980.92 ns (97.05%) |
sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 8,283.40 ns(-1.34%)Baseline: 8,396.24 ns | 8,610.25 ns (96.20%) |
transfer/pacing-false/same-seed | 📈 view plot 🚷 view threshold | 35,787,000.00 ns(+2.48%)Baseline: 34,920,095.89 ns | 36,602,828.37 ns (97.77%) |
transfer/pacing-false/varying-seeds | 📈 view plot 🚷 view threshold | 35,447,000.00 ns(+1.09%)Baseline: 35,063,780.82 ns | 36,793,170.04 ns (96.34%) |
transfer/pacing-true/same-seed | 📈 view plot 🚷 view threshold | 37,384,000.00 ns(+2.34%)Baseline: 36,530,616.44 ns | 38,137,907.42 ns (98.02%) |
transfer/pacing-true/varying-seeds | 📈 view plot 🚷 view threshold | 36,577,000.00 ns(+1.88%)Baseline: 35,901,520.55 ns | 37,524,596.72 ns (97.47%) |
|
Branch | fix-features |
Testbed | t-linux64-ms-279 |
Click to view all benchmark results
Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
---|---|---|---|
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 627,590,000.00 ns(-16.09%)Baseline: 747,942,500.00 ns | 1,240,194,385.25 ns (50.60%) |
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 198,940,000.00 ns(-52.41%)Baseline: 418,047,500.00 ns | 1,407,118,298.59 ns (14.14%) |
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 27,439,000.00 ns(+0.77%)Baseline: 27,228,750.00 ns | 28,333,645.74 ns (96.84%) |
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 304,360,000.00 ns(+1.17%)Baseline: 300,840,000.00 ns | 313,432,123.06 ns (97.11%) |
1000 streams of 1 bytes/multistream | 📈 view plot 🚷 view threshold | 47.95 ns(+33.04%)Baseline: 36.04 ns | 69.67 ns (68.81%) |
1000 streams of 1000 bytes/multistream | 📈 view plot 🚷 view threshold | 47.18 ns(+22.32%)Baseline: 38.57 ns | 62.10 ns (75.97%) |
RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 107,960,000.00 ns(-0.88%)Baseline: 108,920,000.00 ns | 116,055,769.08 ns (93.02%) |
coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.42 ns(+0.04%)Baseline: 88.39 ns | 89.08 ns (99.25%) |
coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 105.03 ns(-0.12%)Baseline: 105.16 ns | 106.22 ns (98.88%) |
coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 88.78 ns(-2.06%)Baseline: 90.64 ns | 99.03 ns (89.65%) |
coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 106.12 ns(+0.17%)Baseline: 105.94 ns | 106.90 ns (99.27%) |
decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,799,700.00 ns(+9.55%)Baseline: 1,642,750.00 ns | 2,054,214.67 ns (87.61%) |
decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,097,200.00 ns(+0.65%)Baseline: 5,064,475.00 ns | 5,154,319.19 ns (98.89%) |
decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold | 3,067,900.00 ns(+0.93%)Baseline: 3,039,775.00 ns | 3,113,852.76 ns (98.52%) |
decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 5,558.80 ns(-26.89%)Baseline: 7,603.60 ns | 12,964.41 ns (42.88%) |
decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 19,446.00 ns(-2.06%)Baseline: 19,854.25 ns | 20,924.94 ns (92.93%) |
decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,672.00 ns(-1.22%)Baseline: 11,816.50 ns | 12,226.78 ns (95.46%) |
sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 8,320.70 ns(+1.78%)Baseline: 8,175.35 ns | 8,875.44 ns (93.75%) |
transfer/pacing-false/same-seed | 📈 view plot 🚷 view threshold | 36,650,000.00 ns(+1.92%)Baseline: 35,961,000.00 ns | 39,418,403.09 ns (92.98%) |
transfer/pacing-false/varying-seeds | 📈 view plot 🚷 view threshold | 37,293,000.00 ns(+2.89%)Baseline: 36,245,500.00 ns | 40,338,863.04 ns (92.45%) |
transfer/pacing-true/same-seed | 📈 view plot 🚷 view threshold | 38,817,000.00 ns(+3.64%)Baseline: 37,452,500.00 ns | 42,197,319.94 ns (91.99%) |
transfer/pacing-true/varying-seeds | 📈 view plot 🚷 view threshold | 38,031,000.00 ns(+3.16%)Baseline: 36,865,250.00 ns | 41,255,039.36 ns (92.19%) |
|
Branch | fix-features |
Testbed | t-linux64-ms-279 |
Click to view all benchmark results
Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
---|---|---|---|
s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 170.83 ms(-28.03%)Baseline: 237.36 ms | 538.39 ms (31.73%) |
|
Branch | fix-features |
Testbed | t-linux64-ms-278 |
Click to view all benchmark results
Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
---|---|---|---|
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 644,780,000.00 ns(-22.62%)Baseline: 833,280,000.00 ns | 1,135,572,397.07 ns (56.78%) |
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 655,620,000.00 ns(+0.28%)Baseline: 653,791,666.67 ns | 675,797,849.36 ns (97.01%) |
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 27,119,000.00 ns(-0.06%)Baseline: 27,134,166.67 ns | 27,330,233.80 ns (99.23%) |
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 298,380,000.00 ns(-0.07%)Baseline: 298,586,666.67 ns | 301,752,185.80 ns (98.88%) |
1000 streams of 1 bytes/multistream | 📈 view plot 🚷 view threshold | 36.50 ns(+13.87%)Baseline: 32.05 ns | 44.97 ns (81.16%) |
1000 streams of 1000 bytes/multistream | 📈 view plot 🚷 view threshold | 48.14 ns(+40.65%)Baseline: 34.23 ns | 57.58 ns (83.60%) |
RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 108,040,000.00 ns(+0.44%)Baseline: 107,566,666.67 ns | 108,789,772.31 ns (99.31%) |
coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.51 ns(-0.42%)Baseline: 88.88 ns | 90.79 ns (97.49%) |
coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 105.27 ns(-0.52%)Baseline: 105.82 ns | 107.01 ns (98.37%) |
coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 90.32 ns(+1.01%)Baseline: 89.42 ns | 91.19 ns (99.05%) |
coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 105.76 ns(-0.53%)Baseline: 106.32 ns | 107.76 ns (98.14%) |
decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,784,300.00 ns(+9.65%)Baseline: 1,627,216.67 ns | 1,863,844.66 ns (95.73%) |
decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,098,800.00 ns(+0.77%)Baseline: 5,059,716.67 ns | 5,121,583.33 ns (99.56%) |
decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold | 3,073,400.00 ns(+1.09%)Baseline: 3,040,133.33 ns | 3,090,833.22 ns (99.44%) |
decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 5,558.10 ns(-29.14%)Baseline: 7,844.00 ns | 11,284.48 ns (49.25%) |
decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 19,372.00 ns(-2.70%)Baseline: 19,909.00 ns | 20,724.64 ns (93.47%) |
decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,642.00 ns(-1.47%)Baseline: 11,816.00 ns | 12,085.52 ns (96.33%) |
sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 8,293.30 ns(+1.08%)Baseline: 8,204.87 ns | 8,385.40 ns (98.90%) |
transfer/pacing-false/same-seed | 📈 view plot 🚷 view threshold | 35,240,000.00 ns(+1.40%)Baseline: 34,753,000.00 ns | 35,696,133.31 ns (98.72%) |
transfer/pacing-false/varying-seeds | 📈 view plot 🚷 view threshold | 35,181,000.00 ns(+1.13%)Baseline: 34,789,000.00 ns | 35,833,247.53 ns (98.18%) |
transfer/pacing-true/same-seed | 📈 view plot 🚷 view threshold | 36,688,000.00 ns(+1.16%)Baseline: 36,266,166.67 ns | 37,342,750.44 ns (98.25%) |
transfer/pacing-true/varying-seeds | 📈 view plot 🚷 view threshold | 36,264,000.00 ns(+1.93%)Baseline: 35,575,833.33 ns | 36,866,018.78 ns (98.37%) |
|
Branch | fix-features |
Testbed | t-linux64-ms-278 |
Click to view all benchmark results
Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
---|---|---|---|
s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 300.26 ms(-1.12%)Baseline: 303.67 ms | 315.23 ms (95.25%) |
Benchmark resultsPerformance differences relative to 5387454. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.time: [199.06 ms 199.35 ms 199.65 ms] thrpt: [500.87 MiB/s 501.62 MiB/s 502.36 MiB/s] change: time: [−2.2498% −1.8915% −1.5688%] (p = 0.00 < 0.05) thrpt: [+1.5938% +1.9280% +2.3015%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [300.97 ms 302.53 ms 304.09 ms] thrpt: [32.885 Kelem/s 33.054 Kelem/s 33.226 Kelem/s] change: time: [−1.1631% −0.4865% +0.1677%] (p = 0.17 > 0.05) thrpt: [−0.1674% +0.4889% +1.1768%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [27.466 ms 27.552 ms 27.652 ms] thrpt: [36.164 B/s 36.295 B/s 36.408 B/s] change: time: [−0.7318% −0.1418% +0.3798%] (p = 0.63 > 0.05) thrpt: [−0.3784% +0.1420% +0.7372%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.time: [632.99 ms 636.81 ms 640.61 ms] thrpt: [156.10 MiB/s 157.03 MiB/s 157.98 MiB/s] change: time: [−3.5012% −2.5727% −1.5710%] (p = 0.00 < 0.05) thrpt: [+1.5961% +2.6406% +3.6282%] decode 4096 bytes, mask ff: 💚 Performance has improved.time: [11.617 µs 11.651 µs 11.693 µs] change: [−1.8753% −1.5051% −1.1287%] (p = 0.00 < 0.05) decode 1048576 bytes, mask ff: Change within noise threshold.time: [3.0609 ms 3.0704 ms 3.0816 ms] change: [+0.8765% +1.3615% +1.8152%] (p = 0.00 < 0.05) decode 4096 bytes, mask 7f: 💚 Performance has improved.time: [19.380 µs 19.433 µs 19.490 µs] change: [−3.0589% −2.6482% −2.2598%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 7f: Change within noise threshold.time: [5.0899 ms 5.1123 ms 5.1425 ms] change: [+0.2416% +0.9940% +1.8198%] (p = 0.01 < 0.05) decode 4096 bytes, mask 3f: 💚 Performance has improved.time: [5.5223 µs 5.5392 µs 5.5632 µs] change: [−33.607% −33.133% −32.618%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 3f: 💔 Performance has regressed.time: [1.7577 ms 1.7579 ms 1.7580 ms] change: [+9.9304% +10.412% +10.805%] (p = 0.00 < 0.05) 1000 streams of 1 bytes/multistream: 💔 Performance has regressed.time: [46.961 ns 47.141 ns 47.319 ns] change: [+25.604% +27.038% +28.507%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.time: [46.879 ns 47.087 ns 47.298 ns] change: [+31.174% +32.753% +34.387%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [88.056 ns 88.395 ns 88.731 ns] change: [−0.7580% −0.1239% +0.6309%] (p = 0.74 > 0.05) coalesce_acked_from_zero 3+1 entries: Change within noise threshold.time: [105.52 ns 105.89 ns 106.27 ns] change: [−1.1268% −0.6448% −0.1497%] (p = 0.01 < 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [104.94 ns 105.35 ns 105.87 ns] change: [−0.6239% +0.1725% +1.2132%] (p = 0.77 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [88.740 ns 88.837 ns 88.954 ns] change: [−1.4282% −0.4340% +0.6132%] (p = 0.43 > 0.05) RxStreamOrderer::inbound_frame(): No change in performance detected.time: [107.94 ms 108.07 ms 108.22 ms] change: [−0.4014% −0.1339% +0.0813%] (p = 0.32 > 0.05) sent::Packets::take_ranges: No change in performance detected.time: [8.0711 µs 8.2591 µs 8.4299 µs] change: [−1.8308% +4.6497% +16.282%] (p = 0.39 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [36.908 ms 37.009 ms 37.125 ms] change: [+0.5332% +0.9068% +1.2643%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [37.866 ms 37.985 ms 38.106 ms] change: [+1.2718% +1.7401% +2.1818%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [36.809 ms 36.895 ms 36.995 ms] change: [+1.4011% +1.7223% +2.0394%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [38.809 ms 38.888 ms 38.970 ms] change: [+1.9028% +2.2730% +2.6189%] (p = 0.00 < 0.05) Download data for |
Client/server transfer resultsPerformance differences relative to 5387454. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
I think there is some weak signal that there is some performance benefit here. |
Co-authored-by: Martin Thomson <mt@lowentropy.net> Signed-off-by: Lars Eggert <lars@eggert.org>
Client/server transfer resultsPerformance differences relative to 76a8a60. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsPerformance differences relative to 76a8a60. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.time: [200.62 ms 200.95 ms 201.27 ms] thrpt: [496.84 MiB/s 497.64 MiB/s 498.45 MiB/s] change: time: [+0.3888% +0.6600% +0.9123%] (p = 0.00 < 0.05) thrpt: [−0.9041% −0.6556% −0.3872%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [303.63 ms 305.10 ms 306.58 ms] thrpt: [32.618 Kelem/s 32.776 Kelem/s 32.935 Kelem/s] change: time: [−1.1869% −0.5339% +0.1572%] (p = 0.12 > 0.05) thrpt: [−0.1570% +0.5368% +1.2011%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: Change within noise threshold.time: [27.593 ms 27.711 ms 27.851 ms] thrpt: [35.906 B/s 36.087 B/s 36.241 B/s] change: time: [+0.0765% +0.6617% +1.2188%] (p = 0.02 < 0.05) thrpt: [−1.2042% −0.6573% −0.0765%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: No change in performance detected.time: [632.30 ms 636.66 ms 640.95 ms] thrpt: [156.02 MiB/s 157.07 MiB/s 158.15 MiB/s] change: time: [−0.5513% +0.3484% +1.1860%] (p = 0.44 > 0.05) thrpt: [−1.1721% −0.3472% +0.5543%] decode 4096 bytes, mask ff: Change within noise threshold.time: [11.608 µs 11.748 µs 12.010 µs] change: [−2.9510% −1.9178% −0.7981%] (p = 0.00 < 0.05) decode 1048576 bytes, mask ff: Change within noise threshold.time: [3.0624 ms 3.0717 ms 3.0826 ms] change: [+0.9793% +1.4487% +1.8947%] (p = 0.00 < 0.05) decode 4096 bytes, mask 7f: 💚 Performance has improved.time: [19.333 µs 19.372 µs 19.417 µs] change: [−4.3908% −3.7307% −3.2190%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 7f: Change within noise threshold.time: [5.0855 ms 5.0988 ms 5.1138 ms] change: [+0.4164% +0.8193% +1.1970%] (p = 0.00 < 0.05) decode 4096 bytes, mask 3f: 💚 Performance has improved.time: [5.5231 µs 5.5830 µs 5.6974 µs] change: [−33.422% −32.886% −32.152%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 3f: 💔 Performance has regressed.time: [1.7579 ms 1.7608 ms 1.7651 ms] change: [+9.4512% +10.305% +10.985%] (p = 0.00 < 0.05) 1000 streams of 1 bytes/multistream: 💔 Performance has regressed.time: [36.759 ns 37.233 ns 37.706 ns] change: [+30.034% +31.950% +33.949%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.time: [36.784 ns 41.017 ns 49.089 ns] change: [+29.043% +44.089% +72.664%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [87.836 ns 88.148 ns 88.457 ns] change: [−0.8316% −0.2213% +0.3933%] (p = 0.49 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [105.48 ns 105.80 ns 106.13 ns] change: [−0.1882% +0.1444% +0.5082%] (p = 0.42 > 0.05) coalesce_acked_from_zero 10+1 entries: Change within noise threshold.time: [104.66 ns 104.90 ns 105.25 ns] change: [−2.0151% −1.1299% −0.4133%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [88.792 ns 89.062 ns 89.469 ns] change: [−0.9132% +0.0697% +1.0494%] (p = 0.89 > 0.05) RxStreamOrderer::inbound_frame(): No change in performance detected.time: [108.18 ms 108.29 ms 108.44 ms] change: [−0.0366% +0.0835% +0.2346%] (p = 0.26 > 0.05) sent::Packets::take_ranges: No change in performance detected.time: [7.9960 µs 8.2006 µs 8.3887 µs] change: [−1.8530% +4.6956% +15.160%] (p = 0.39 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [36.552 ms 36.618 ms 36.684 ms] change: [−2.4071% −2.0832% −1.7592%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [37.331 ms 37.444 ms 37.563 ms] change: [−3.0125% −2.5383% −2.1055%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [36.476 ms 36.537 ms 36.599 ms] change: [−2.4903% −2.2703% −2.0461%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [38.046 ms 38.131 ms 38.215 ms] change: [−3.0493% −2.7758% −2.4806%] (p = 0.00 < 0.05) Download data for |
Let's see if they do.
Also, @mxinden, I was wondering why we went with a multi-threaded
tokio
client and server. I'm wondering if the thread-management overheads are worth it compared to using just thert
scheduler?