IO Driver sensitive to scheduling delays #7612
Unanswered
olivergillespie
asked this question in
General
Replies: 1 comment 1 reply
-
My initial reaction to this is that you might be putting too much work on a single machine. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR: Scheduling delays are common, and Tokio’s IO driver is especially sensitive to them. Can/should Tokio do anything about it?
Overview
Scheduling delays over 1ms are common [1] in many typical Linux configurations.
Scheduling delays which hit Tokio workers are usually limited to impacting one task, but delaying the IO driver can block progress for some or all worker threads, as no IO events can be delivered. The IO driver is also especially likely to experience scheduling delays, thanks to the epoll_wait syscall in every turn.
Thus, the expected impact from scheduling delays is amplified for Tokio applications which frequently use IO.
Demonstration
(TODO: minimal repro with busy background system, IO worker hit by delay, all in-flight requests paused)
We see below, with data from my application, that scheduling delays usually have little impact, but delays to the IO driver reliably cause bursts of slow requests. The IO driver is especially sensitive to scheduling delays.
Requests are marked as individual green points on the scatter plot with their end time and latency. Scheduling delays (recorded with
bpftrace
) over 2ms affecting any of the tokio worker threads are overlaid as red bands. Time spent inio::driver::Driver::turn
is also traced withbpftrace
and overlaid in blue. Since this overlaps with a scheduling delay, it shows as purple.Mitigations
Scheduling delays can be reduced by tuning the kernel/scheduler, using CPU affinity and isolation, changing scheduling priorities or using a real-time scheduler. These may partially (tuning) or entirely (isolation) avoid scheduling delays for Tokio workers. Each of these has tradeoffs, though, and requires manual, sometimes brittle configuration.
Splitting one runtime into several runtimes reduces the maximum impact of any one delay, but adds configuration complexity while reducing efficiency and limiting work-stealing.
Detection
TODO: details. User can monitor scheduling delays with
perf sched
or sched tracepoints viabpftrace
or similar. These can be correlated with request latencies, and with IO driver execution times by adding further tracing.Discussion
Tokio can not avoid scheduling delays, but it might be able to reduce the impact if the architectural choices amplify the problem, which seems to be the case with the IO driver.
Mitigations may reduce performance for well-behaved systems, or reduce throughput to improve tail latency, so they may need to be user-configurable.
[1] Linux scheduler has a tick rate set by CONFIG_HZ. 100, 250 and 1000Hz are the common options, corresponding to 10ms, 4ms and 1ms respectively. Along with a variety of other details, this means in practice that a busy M:N system where M > N is bound to frequently experience 1ms+ delays, and this is not a bug
Beta Was this translation helpful? Give feedback.
All reactions