Skip to content

Commit 3134c5a

Browse files
committed
Replace a timeout task with timedwait()
According to a stacktrace from a hung CI job this task was causing the process to hang before exiting: ```julia InterruptException() _jl_mutex_unlock at C:/workdir/src\threading.c:1012 jl_mutex_unlock at C:/workdir/src\julia_locks.h:80 [inlined] ijl_task_get_next at C:/workdir/src\scheduler.c:458 poptask at .\task.jl:1163 wait at .\task.jl:1172 task_done_hook at .\task.jl:839 jfptr_task_done_hook_98752.1 at C:\hostedtoolcache\windows\julia\nightly\x64\lib\julia\sys.dll (unknown line) jl_apply at C:/workdir/src\julia.h:2233 [inlined] jl_finish_task at C:/workdir/src\task.c:338 start_task at C:/workdir/src\task.c:1274 From worker 82: fatal: error thrown and no exception handler available.Unhandled Task ERROR: InterruptException: Stacktrace: [1] poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) @ Base .\task.jl:1163 [2] wait() @ Base .\task.jl:1172 [3] wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) @ Base .\condition.jl:141 [4] wait @ .\condition.jl:136 [inlined] [5] put_buffered(c::Channel{Any}, v::Int64) @ Base .\channels.jl:420 [6] put!(c::Channel{Any}, v::Int64) @ Base .\channels.jl:398 [7] put!(rv::DistributedNext.RemoteValue, args::Int64) @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\remotecall.jl:703 [8] (::DistributedNext.var"#create_worker##11#create_worker##12"{DistributedNext.RemoteValue, Float64})() @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\cluster.jl:721 ``` Replaced it with a call to `timedwait()`, which has the advantage of being a lot simpler than an extra task.
1 parent 90aba40 commit 3134c5a

File tree

2 files changed

+7
-10
lines changed

2 files changed

+7
-10
lines changed

docs/src/_changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ CurrentModule = DistributedNext
77
This documents notable changes in DistributedNext.jl. The format is based on
88
[Keep a Changelog](https://keepachangelog.com).
99

10+
## Unreleased
11+
12+
### Fixed
13+
- Fixed a cause of potential hangs when exiting the process ([#16]).
14+
1015
## [v1.0.0] - 2024-12-02
1116

1217
### Added

src/cluster.jl

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -712,17 +712,9 @@ function create_worker(manager, wconfig)
712712
send_msg_now(w, MsgHeader(RRID(0,0), ntfy_oid), join_message)
713713

714714
errormonitor(@async manage(w.manager, w.id, w.config, :register))
715+
715716
# wait for rr_ntfy_join with timeout
716-
timedout = false
717-
errormonitor(
718-
@async begin
719-
sleep($timeout)
720-
timedout = true
721-
put!(rr_ntfy_join, 1)
722-
end
723-
)
724-
wait(rr_ntfy_join)
725-
if timedout
717+
if timedwait(() -> isready(rr_ntfy_join), timeout) === :timed_out
726718
error("worker did not connect within $timeout seconds")
727719
end
728720
lock(client_refs) do

0 commit comments

Comments
 (0)