You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Linux] Prevent GC from running during process teardown (#57832)
## Context
We send a signal 15 to shutdown our servers.
We noticed that some of our servers that receive the termination signal
are segfaulting in GC, which leads to false alarms in our internal
monitors that track GC-related crashes.
## Hypothesis
We suspect this pathological case may be happening:
- Process receives signal 15, which is captured by the signal listener
thread.
- Signal listener initiates process' teardown (e.g. through `raise`).
- IIRC such operation is not atomic in Linux, i.e. the kernel will
gradually kill the threads, but it's possible for us to spent a few ms
in a state where part of the threads in the system are alive, and part
have already been killed (this point needs some confirmation).
- With part of the process alive, and part of the process dead, we try
to enter a GC, see a bunch of Julia data structures in an
intermediate/corrupted state, which leads us to crash when running the
GC.
## Mitigation
Since our main goal is to get rid of the GC crashes that happen around
server shutdown, we believe that it would be sufficient to just prevent
the last bullet point. I.e. we prevent the system from even running a GC
when we're about to kill the process, and we wait for any ongoing GC to
finish.
Co-debugged with @kpamnany.
(cherry picked from commit e1e3a46)
0 commit comments