Skip to content

Commit 53dac34

Browse files
Frederic WeisbeckerKAGA-KOKO
authored andcommitted
hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers handling tasks involved in the CPU hotplug forward progress. However wakeups can still be performed by the outgoing CPU after CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being armed. Depending on several considerations (crystal ball power management based election, earliest timer already enqueued, timer migration enabled or not), the target may eventually be the current CPU even if offline. If that happens, the timer is eventually ignored. The most notable example is RCU which had to deal with each and every of those wake-ups by deferring them to an online CPU, along with related workarounds: _ e787644 (rcu: Defer RCU kthreads wakeup when CPU is dying) _ 9139f93 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU) _ f7345cc (rcu/nocb: Fix rcuog wake-up from offline softirq) The problem isn't confined to RCU though as the stop machine kthread (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end of its work through cpu_stop_signal_done() and performs a wake up that eventually arms the deadline server timer: WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0 CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0 RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0 Call Trace: <TASK> start_dl_timer enqueue_dl_entity dl_server_start enqueue_task_fair enqueue_task ttwu_do_activate try_to_wake_up complete cpu_stopper_thread Instead of providing yet another bandaid to work around the situation, fix it in the hrtimers infrastructure instead: always migrate away a timer to an online target whenever it is enqueued from an offline CPU. This will also allow to revert all the above RCU disgraceful hacks. Fixes: 5c0930c ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Reported-by: Vlad Poenaru <vlad.wing@gmail.com> Reported-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org Closes: 20241213203739.1519801-1-usamaarif642@gmail.com
1 parent 27af31e commit 53dac34

File tree

2 files changed

+83
-21
lines changed

2 files changed

+83
-21
lines changed

include/linux/hrtimer_defs.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
125125
ktime_t softirq_expires_next;
126126
struct hrtimer *softirq_next_timer;
127127
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
128+
call_single_data_t csd;
128129
} ____cacheline_aligned;
129130

130131

kernel/time/hrtimer.c

Lines changed: 82 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@
5858
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
5959
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
6060

61+
static void retrigger_next_event(void *arg);
62+
6163
/*
6264
* The timer bases:
6365
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
111113
.clockid = CLOCK_TAI,
112114
.get_time = &ktime_get_clocktai,
113115
},
114-
}
116+
},
117+
.csd = CSD_INIT(retrigger_next_event, NULL)
115118
};
116119

117120
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
124127
[CLOCK_TAI] = HRTIMER_BASE_TAI,
125128
};
126129

130+
static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
131+
{
132+
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
133+
return true;
134+
else
135+
return likely(base->online);
136+
}
137+
127138
/*
128139
* Functions and macros which are different for UP/SMP systems are kept in a
129140
* single place
@@ -178,27 +189,54 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
178189
}
179190

180191
/*
181-
* We do not migrate the timer when it is expiring before the next
182-
* event on the target cpu. When high resolution is enabled, we cannot
183-
* reprogram the target cpu hardware and we would cause it to fire
184-
* late. To keep it simple, we handle the high resolution enabled and
185-
* disabled case similar.
192+
* Check if the elected target is suitable considering its next
193+
* event and the hotplug state of the current CPU.
194+
*
195+
* If the elected target is remote and its next event is after the timer
196+
* to queue, then a remote reprogram is necessary. However there is no
197+
* guarantee the IPI handling the operation would arrive in time to meet
198+
* the high resolution deadline. In this case the local CPU becomes a
199+
* preferred target, unless it is offline.
200+
*
201+
* High and low resolution modes are handled the same way for simplicity.
186202
*
187203
* Called with cpu_base->lock of target cpu held.
188204
*/
189-
static int
190-
hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
205+
static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,
206+
struct hrtimer_cpu_base *new_cpu_base,
207+
struct hrtimer_cpu_base *this_cpu_base)
191208
{
192209
ktime_t expires;
193210

211+
/*
212+
* The local CPU clockevent can be reprogrammed. Also get_target_base()
213+
* guarantees it is online.
214+
*/
215+
if (new_cpu_base == this_cpu_base)
216+
return true;
217+
218+
/*
219+
* The offline local CPU can't be the default target if the
220+
* next remote target event is after this timer. Keep the
221+
* elected new base. An IPI will we issued to reprogram
222+
* it as a last resort.
223+
*/
224+
if (!hrtimer_base_is_online(this_cpu_base))
225+
return true;
226+
194227
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
195-
return expires < new_base->cpu_base->expires_next;
228+
229+
return expires >= new_base->cpu_base->expires_next;
196230
}
197231

198-
static inline
199-
struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
200-
int pinned)
232+
static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)
201233
{
234+
if (!hrtimer_base_is_online(base)) {
235+
int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
236+
237+
return &per_cpu(hrtimer_bases, cpu);
238+
}
239+
202240
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
203241
if (static_branch_likely(&timers_migration_enabled) && !pinned)
204242
return &per_cpu(hrtimer_bases, get_nohz_timer_target());
@@ -249,8 +287,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
249287
raw_spin_unlock(&base->cpu_base->lock);
250288
raw_spin_lock(&new_base->cpu_base->lock);
251289

252-
if (new_cpu_base != this_cpu_base &&
253-
hrtimer_check_target(timer, new_base)) {
290+
if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
291+
this_cpu_base)) {
254292
raw_spin_unlock(&new_base->cpu_base->lock);
255293
raw_spin_lock(&base->cpu_base->lock);
256294
new_cpu_base = this_cpu_base;
@@ -259,8 +297,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
259297
}
260298
WRITE_ONCE(timer->base, new_base);
261299
} else {
262-
if (new_cpu_base != this_cpu_base &&
263-
hrtimer_check_target(timer, new_base)) {
300+
if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {
264301
new_cpu_base = this_cpu_base;
265302
goto again;
266303
}
@@ -706,8 +743,6 @@ static inline int hrtimer_is_hres_enabled(void)
706743
return hrtimer_hres_enabled;
707744
}
708745

709-
static void retrigger_next_event(void *arg);
710-
711746
/*
712747
* Switch to high resolution mode
713748
*/
@@ -1195,6 +1230,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
11951230
u64 delta_ns, const enum hrtimer_mode mode,
11961231
struct hrtimer_clock_base *base)
11971232
{
1233+
struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
11981234
struct hrtimer_clock_base *new_base;
11991235
bool force_local, first;
12001236

@@ -1206,9 +1242,15 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
12061242
* and enforce reprogramming after it is queued no matter whether
12071243
* it is the new first expiring timer again or not.
12081244
*/
1209-
force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
1245+
force_local = base->cpu_base == this_cpu_base;
12101246
force_local &= base->cpu_base->next_timer == timer;
12111247

1248+
/*
1249+
* Don't force local queuing if this enqueue happens on a unplugged
1250+
* CPU after hrtimer_cpu_dying() has been invoked.
1251+
*/
1252+
force_local &= this_cpu_base->online;
1253+
12121254
/*
12131255
* Remove an active timer from the queue. In case it is not queued
12141256
* on the current CPU, make sure that remove_hrtimer() updates the
@@ -1238,8 +1280,27 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
12381280
}
12391281

12401282
first = enqueue_hrtimer(timer, new_base, mode);
1241-
if (!force_local)
1242-
return first;
1283+
if (!force_local) {
1284+
/*
1285+
* If the current CPU base is online, then the timer is
1286+
* never queued on a remote CPU if it would be the first
1287+
* expiring timer there.
1288+
*/
1289+
if (hrtimer_base_is_online(this_cpu_base))
1290+
return first;
1291+
1292+
/*
1293+
* Timer was enqueued remote because the current base is
1294+
* already offline. If the timer is the first to expire,
1295+
* kick the remote CPU to reprogram the clock event.
1296+
*/
1297+
if (first) {
1298+
struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
1299+
1300+
smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
1301+
}
1302+
return 0;
1303+
}
12431304

12441305
/*
12451306
* Timer was forced to stay on the current CPU to avoid

0 commit comments

Comments
 (0)