Skip to content

Commit 1ceb635

Browse files
andrebsguedesstevengj
authored andcommitted
ReentrantLock: wakeup a single task on unlock and add a short spin (#56814)
I propose a change in the implementation of the `ReentrantLock` to improve its overall throughput for short critical sections and fix the quadratic wake-up behavior where each unlock schedules **all** waiting tasks on the lock's wait queue. This implementation follows the same principles of the `Mutex` in the [parking_lot](https://github.com/Amanieu/parking_lot/tree/master) Rust crate which is based on the Webkit [WTF::ParkingLot](https://webkit.org/blog/6161/locking-in-webkit/) class. Only the basic working principle is implemented here, further improvements such as eventual fairness will be proposed separately. The gist of the change is that we add one extra state to the lock, essentially going from: ``` 0x0 => The lock is not locked 0x1 => The lock is locked by exactly one task. No other task is waiting for it. 0x2 => The lock is locked and some other task tried to lock but failed (conflict) ``` To: ``` # PARKED_BIT | LOCKED_BIT | Description # 0 | 0 | The lock is not locked, nor is anyone waiting for it. # -----------+------------+------------------------------------------------------------------ # 0 | 1 | The lock is locked by exactly one task. No other task is # | | waiting for it. # -----------+------------+------------------------------------------------------------------ # 1 | 0 | The lock is not locked. One or more tasks are parked. # -----------+------------+------------------------------------------------------------------ # 1 | 1 | The lock is locked by exactly one task. One or more tasks are # | | parked waiting for the lock to become available. # | | In this state, PARKED_BIT is only ever cleared when the cond_wait lock # | | is held (i.e. on unlock). This ensures that # | | we never end up in a situation where there are parked tasks but # | | PARKED_BIT is not set (which would result in those tasks # | | potentially never getting woken up). ``` In the current implementation we must schedule all tasks to cause a conflict (state 0x2) because on unlock we only notify any task if the lock is in the conflict state. This behavior means that with high contention and a short critical section the tasks will be effectively spinning in the scheduler queue. With the extra state the proposed implementation has enough information to know if there are other tasks to be notified or not, which means we can always notify one task at a time while preserving the optimized path of not notifying if there are no tasks waiting. To improve throughput for short critical sections we also introduce a bounded amount of spinning before attempting to park. ### Results Not spinning on the scheduler queue greatly reduces the CPU utilization of the following example: ```julia function example() lock = ReentrantLock() @sync begin for i in 1:10000 Threads.@Spawn begin @lock lock begin sleep(0.001) end end end end end @time example() ``` Current: ``` 28.890623 seconds (101.65 k allocations: 7.646 MiB, 0.25% compilation time) ``` ![image](https://github.com/user-attachments/assets/dbd6ce57-c760-4f5a-b68a-27df6a97a46e) Proposed: ``` 22.806669 seconds (101.65 k allocations: 7.814 MiB, 0.35% compilation time) ``` ![image](https://github.com/user-attachments/assets/b0254180-658d-4493-86d3-dea4c500b5ac) In a micro-benchmark where 8 threads contend for a single lock with a very short critical section we see a ~2x improvement. Current: ``` 8-element Vector{Int64}: 6258688 5373952 6651904 6389760 6586368 3899392 5177344 5505024 Total iterations: 45842432 ``` Proposed: ``` 8-element Vector{Int64}: 12320768 12976128 10354688 12845056 7503872 13598720 13860864 11993088 Total iterations: 95453184 ``` ~~In the uncontended scenario the extra bookkeeping causes a 10% throughput reduction:~~ EDIT: I reverted _trylock to the simple case to recover the uncontended throughput and now both implementations are on the same ballpark (without hurting the above numbers). In the uncontended scenario: Current: ``` Total iterations: 236748800 ``` Proposed: ``` Total iterations: 237699072 ``` Closes #56182
1 parent c4e69f7 commit 1ceb635

File tree

2 files changed

+106
-23
lines changed

2 files changed

+106
-23
lines changed

base/lock.jl

Lines changed: 104 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@ Get the currently running [`Task`](@ref).
99
"""
1010
current_task() = ccall(:jl_get_current_task, Ref{Task}, ())
1111

12+
# This bit is set in the `havelock` of a `ReentrantLock` when that lock is locked by some task.
13+
const LOCKED_BIT = 0b01
14+
# This bit is set in the `havelock` of a `ReentrantLock` just before parking a task. A task is being
15+
# parked if it wants to lock the lock, but it is currently being held by some other task.
16+
const PARKED_BIT = 0b10
17+
18+
const MAX_SPIN_ITERS = 40
19+
1220
# Advisory reentrant lock
1321
"""
1422
ReentrantLock()
@@ -43,7 +51,28 @@ mutable struct ReentrantLock <: AbstractLock
4351
# offset32 = 20, offset64 = 24
4452
reentrancy_cnt::UInt32
4553
# offset32 = 24, offset64 = 28
46-
@atomic havelock::UInt8 # 0x0 = none, 0x1 = lock, 0x2 = conflict
54+
#
55+
# This atomic integer holds the current state of the lock instance. Only the two lowest bits
56+
# are used. See `LOCKED_BIT` and `PARKED_BIT` for the bitmask for these bits.
57+
#
58+
# # State table:
59+
#
60+
# PARKED_BIT | LOCKED_BIT | Description
61+
# 0 | 0 | The lock is not locked, nor is anyone waiting for it.
62+
# -----------+------------+------------------------------------------------------------------
63+
# 0 | 1 | The lock is locked by exactly one task. No other task is
64+
# | | waiting for it.
65+
# -----------+------------+------------------------------------------------------------------
66+
# 1 | 0 | The lock is not locked. One or more tasks are parked.
67+
# -----------+------------+------------------------------------------------------------------
68+
# 1 | 1 | The lock is locked by exactly one task. One or more tasks are
69+
# | | parked waiting for the lock to become available.
70+
# | | In this state, PARKED_BIT is only ever cleared when the cond_wait lock
71+
# | | is held (i.e. on unlock). This ensures that
72+
# | | we never end up in a situation where there are parked tasks but
73+
# | | PARKED_BIT is not set (which would result in those tasks
74+
# | | potentially never getting woken up).
75+
@atomic havelock::UInt8
4776
# offset32 = 28, offset64 = 32
4877
cond_wait::ThreadSynchronizer # 2 words
4978
# offset32 = 36, offset64 = 48
@@ -112,7 +141,7 @@ function islocked end
112141
# `ReentrantLock`.
113142

114143
function islocked(rl::ReentrantLock)
115-
return (@atomic :monotonic rl.havelock) != 0
144+
return (@atomic :monotonic rl.havelock) & LOCKED_BIT != 0
116145
end
117146

118147
"""
@@ -136,17 +165,15 @@ function trylock end
136165
@inline function trylock(rl::ReentrantLock)
137166
ct = current_task()
138167
if rl.locked_by === ct
139-
#@assert rl.havelock !== 0x00
140168
rl.reentrancy_cnt += 0x0000_0001
141169
return true
142170
end
143171
return _trylock(rl, ct)
144172
end
145173
@noinline function _trylock(rl::ReentrantLock, ct::Task)
146174
GC.disable_finalizers()
147-
if (@atomicreplace :acquire rl.havelock 0x00 => 0x01).success
148-
#@assert rl.locked_by === nothing
149-
#@assert rl.reentrancy_cnt === 0
175+
state = (@atomic :monotonic rl.havelock) & PARKED_BIT
176+
if (@atomicreplace :acquire rl.havelock state => (state | LOCKED_BIT)).success
150177
rl.reentrancy_cnt = 0x0000_0001
151178
@atomic :release rl.locked_by = ct
152179
return true
@@ -168,23 +195,69 @@ Each `lock` must be matched by an [`unlock`](@ref).
168195
trylock(rl) || (@noinline function slowlock(rl::ReentrantLock)
169196
Threads.lock_profiling() && Threads.inc_lock_conflict_count()
170197
c = rl.cond_wait
171-
lock(c.lock)
172-
try
173-
while true
174-
if (@atomicreplace rl.havelock 0x01 => 0x02).old == 0x00 # :sequentially_consistent ? # now either 0x00 or 0x02
175-
# it was unlocked, so try to lock it ourself
176-
_trylock(rl, current_task()) && break
177-
else # it was locked, so now wait for the release to notify us
178-
wait(c)
198+
ct = current_task()
199+
iteration = 1
200+
while true
201+
state = @atomic :monotonic rl.havelock
202+
# Grab the lock if it isn't locked, even if there is a queue on it
203+
if state & LOCKED_BIT == 0
204+
GC.disable_finalizers()
205+
result = (@atomicreplace :acquire :monotonic rl.havelock state => (state | LOCKED_BIT))
206+
if result.success
207+
rl.reentrancy_cnt = 0x0000_0001
208+
@atomic :release rl.locked_by = ct
209+
return
179210
end
211+
GC.enable_finalizers()
212+
continue
180213
end
181-
finally
182-
unlock(c.lock)
214+
215+
if state & PARKED_BIT == 0
216+
# If there is no queue, try spinning a few times
217+
if iteration <= MAX_SPIN_ITERS
218+
Base.yield()
219+
iteration += 1
220+
continue
221+
end
222+
223+
# If still not locked, try setting the parked bit
224+
@atomicreplace :monotonic :monotonic rl.havelock state => (state | PARKED_BIT)
225+
end
226+
227+
# lock the `cond_wait`
228+
lock(c.lock)
229+
230+
# Last check before we wait to make sure `unlock` did not win the race
231+
# to the `cond_wait` lock and cleared the parked bit
232+
state = @atomic :acquire rl.havelock
233+
if state != LOCKED_BIT | PARKED_BIT
234+
unlock(c.lock)
235+
continue
236+
end
237+
238+
# It was locked, so now wait for the unlock to notify us
239+
wait_no_relock(c)
240+
241+
# Loop back and try locking again
242+
iteration = 1
183243
end
184244
end)(rl)
185245
return
186246
end
187247

248+
function wait_no_relock(c::GenericCondition)
249+
ct = current_task()
250+
_wait2(c, ct)
251+
token = unlockall(c.lock)
252+
try
253+
return wait()
254+
catch
255+
ct.queue === nothing || list_deletefirst!(ct.queue, ct)
256+
rethrow()
257+
end
258+
end
259+
260+
188261
"""
189262
unlock(lock)
190263
@@ -201,18 +274,27 @@ internal counter and return immediately.
201274
rl.reentrancy_cnt = n
202275
if n == 0x0000_00000
203276
@atomic :monotonic rl.locked_by = nothing
204-
if (@atomicswap :release rl.havelock = 0x00) == 0x02
277+
result = (@atomicreplace :release :monotonic rl.havelock LOCKED_BIT => 0x00)
278+
if result.success
279+
return true
280+
else
205281
(@noinline function notifywaiters(rl)
206282
cond_wait = rl.cond_wait
207283
lock(cond_wait)
208-
try
209-
notify(cond_wait)
210-
finally
211-
unlock(cond_wait)
284+
285+
notify(cond_wait, all=false)
286+
if !isempty(cond_wait.waitq)
287+
@atomic :release rl.havelock = PARKED_BIT
288+
else
289+
# We may have won the race to the `cond_wait` lock as a task was about to park
290+
# but we unlock anyway as any parking task will retry
291+
@atomic :release rl.havelock = 0x00
212292
end
293+
294+
unlock(cond_wait)
213295
end)(rl)
296+
return true
214297
end
215-
return true
216298
end
217299
return false
218300
end)(rl) && GC.enable_finalizers()

test/threads.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ let lk = ReentrantLock()
1616
t2 = @async (notify(c2); trylock(lk))
1717
wait(c1)
1818
wait(c2)
19-
@test t1.queue === lk.cond_wait.waitq
19+
# wait for the task to park in the queue (it may be spinning)
20+
@test timedwait(() -> t1.queue === lk.cond_wait.waitq, 1.0) == :ok
2021
@test t2.queue !== lk.cond_wait.waitq
2122
@test istaskdone(t2)
2223
@test !fetch(t2)

0 commit comments

Comments
 (0)