Skip to content

Commit e626cb0

Browse files
Sebastian Andrzej SiewiorKAGA-KOKO
authored andcommitted
futex: Prevent the reuse of stale pi_state
Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during a lock operation for a PI futex. It requires that the a lock process is interrupted by a timeout or signal: T1 Owns the futex in user space. T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a pi_state and attaches itself to it. T2 Times out and removes its rt_waiter from the rt_mutex. Drops the rtmutex lock and tries to acquire the hash bucket lock to remove the futex_q. The lock is contended and T2 schedules out. T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no rt_waiter. Unlocks the futex (do_uncontended) and makes it available to user space. T3 Acquires the futex in user space. T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the existing futex_q of T2 and tries to attach itself to the existing pi_state. This (attach_to_pi_state()) fails with -EINVAL because uval contains the TID of T3 but pi_state points to T1. It's incorrect to unlock the futex and make it available for user space to acquire as long as there is still an existing state attached to it in the kernel. T1 cannot hand over the futex to T2 because T2 already gave up and started to clean up and is blocked on the hash bucket lock, so T2's futex_q with the pi_state pointing to T1 is still queued. T2 observes the futex_q, but ignores it as there is no waiter on the corresponding rt_mutex and takes the uncontended path which allows the subsequent caller of futex_lock_pi() (T4) to observe that stale state. To prevent this the unlock path must dequeue all futex_q entries which point to the same pi_state when there is no waiter on the rt mutex. This requires obviously to make the dequeue conditional in the locking path to prevent a double dequeue. With that it's guaranteed that user space cannot observe an uncontended futex which has kernel state attached. Fixes: fbeb558 ("futex/pi: Fix recursive rt_mutex waiter state") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org
1 parent 296455a commit e626cb0

File tree

2 files changed

+20
-6
lines changed

2 files changed

+20
-6
lines changed

kernel/futex/core.c

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -627,12 +627,21 @@ int futex_unqueue(struct futex_q *q)
627627
}
628628

629629
/*
630-
* PI futexes can not be requeued and must remove themselves from the
631-
* hash bucket. The hash bucket lock (i.e. lock_ptr) is held.
630+
* PI futexes can not be requeued and must remove themselves from the hash
631+
* bucket. The hash bucket lock (i.e. lock_ptr) is held.
632632
*/
633633
void futex_unqueue_pi(struct futex_q *q)
634634
{
635-
__futex_unqueue(q);
635+
/*
636+
* If the lock was not acquired (due to timeout or signal) then the
637+
* rt_waiter is removed before futex_q is. If this is observed by
638+
* an unlocker after dropping the rtmutex wait lock and before
639+
* acquiring the hash bucket lock, then the unlocker dequeues the
640+
* futex_q from the hash bucket list to guarantee consistent state
641+
* vs. userspace. Therefore the dequeue here must be conditional.
642+
*/
643+
if (!plist_node_empty(&q->list))
644+
__futex_unqueue(q);
636645

637646
BUG_ON(!q->pi_state);
638647
put_pi_state(q->pi_state);

kernel/futex/pi.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1135,6 +1135,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
11351135

11361136
hb = futex_hash(&key);
11371137
spin_lock(&hb->lock);
1138+
retry_hb:
11381139

11391140
/*
11401141
* Check waiters first. We do not trust user space values at
@@ -1177,12 +1178,17 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
11771178
/*
11781179
* Futex vs rt_mutex waiter state -- if there are no rt_mutex
11791180
* waiters even though futex thinks there are, then the waiter
1180-
* is leaving and the uncontended path is safe to take.
1181+
* is leaving. The entry needs to be removed from the list so a
1182+
* new futex_lock_pi() is not using this stale PI-state while
1183+
* the futex is available in user space again.
1184+
* There can be more than one task on its way out so it needs
1185+
* to retry.
11811186
*/
11821187
rt_waiter = rt_mutex_top_waiter(&pi_state->pi_mutex);
11831188
if (!rt_waiter) {
1189+
__futex_unqueue(top_waiter);
11841190
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
1185-
goto do_uncontended;
1191+
goto retry_hb;
11861192
}
11871193

11881194
get_pi_state(pi_state);
@@ -1217,7 +1223,6 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
12171223
return ret;
12181224
}
12191225

1220-
do_uncontended:
12211226
/*
12221227
* We have no kernel internal state, i.e. no waiters in the
12231228
* kernel. Waiters which are about to queue themselves are stuck

0 commit comments

Comments
 (0)