Skip to content

Commit 0a65bc2

Browse files
jdamato-fslybrauner
authored andcommitted
eventpoll: Set epoll timeout if it's in the future
Avoid an edge case where epoll_wait arms a timer and calls schedule() even if the timer will expire immediately. For example: if the user has specified an epoll busy poll usecs which is equal or larger than the epoll_wait/epoll_pwait2 timeout, it is unnecessary to call schedule_hrtimeout_range; the busy poll usecs have consumed the entire timeout duration so it is unnecessary to induce scheduling latency by calling schedule() (via schedule_hrtimeout_range). This can be measured using a simple bpftrace script: tracepoint:sched:sched_switch / args->prev_pid == $1 / { print(kstack()); print(ustack()); } Before this patch is applied: Testing an epoll_wait app with busy poll usecs set to 1000, and epoll_wait timeout set to 1ms using the script above shows: __traceiter_sched_switch+69 __schedule+1495 schedule+32 schedule_hrtimeout_range+159 do_epoll_wait+1424 __x64_sys_epoll_wait+97 do_syscall_64+95 entry_SYSCALL_64_after_hwframe+118 epoll_wait+82 Which is unexpected; the busy poll usecs should have consumed the entire timeout and there should be no reason to arm a timer. After this patch is applied: the same test scenario does not generate a call to schedule() in the above edge case. If the busy poll usecs are reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is armed as expected. Fixes: bf3b9f6 ("epoll: Add busy poll support to epoll with socket fds.") Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://lore.kernel.org/20250416185826.26375-1-jdamato@fastly.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent a681b7c commit 0a65bc2

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

fs/eventpoll.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
19961996
return res;
19971997
}
19981998

1999+
static int ep_schedule_timeout(ktime_t *to)
2000+
{
2001+
if (to)
2002+
return ktime_after(*to, ktime_get());
2003+
else
2004+
return 1;
2005+
}
2006+
19992007
/**
20002008
* ep_poll - Retrieves ready events, and delivers them to the caller-supplied
20012009
* event buffer.
@@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
21032111

21042112
write_unlock_irq(&ep->lock);
21052113

2106-
if (!eavail)
2114+
if (!eavail && ep_schedule_timeout(to))
21072115
timed_out = !schedule_hrtimeout_range(to, slack,
21082116
HRTIMER_MODE_ABS);
21092117
__set_current_state(TASK_RUNNING);

0 commit comments

Comments
 (0)