Skip to content

Commit df4ad36

Browse files
committed
[lldb/linux] Fix a race in handling of simultaneous thread exits
D116372, while fixing one kind of a race, ended up creating a new one. The new issue could occur when one inferior thread exits while another thread initiates termination of the entire process (exit_group(2)). With some bad luck, we could start processing the exit notification (PTRACE_EVENT_EXIT) only to have the become unresponsive (ESRCH) in the middle of the MonitorCallback function. This function would then delete the thread from our list even though it wasn't completely dead (it stays zombified until we read the WIFEXITED event). The linux kernel will not deliver the exited event for the entire process until we process individual thread exits. In a pre-D116372 world, this wouldn't be a problem because we would read this event (even though we would not know what to do with it) with waitpid(-1). Now, when we issue invididual waitpids, this event will never be picked up, and we end up hanging. The fix for this is actually quite simple -- don't delete the thread in this situation. The thread will be deleted when the WIFEXITED event comes. This situation was kind of already tested by TestCreateDuringInstructionStep (which is how I found this problem), but it was mostly accidental, so I am also creating a dedicated test which reproduces this situation.
1 parent 95a9372 commit df4ad36

File tree

4 files changed

+63
-26
lines changed

4 files changed

+63
-26
lines changed

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp

Lines changed: 8 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -485,34 +485,16 @@ void NativeProcessLinux::MonitorCallback(NativeThreadLinux &thread,
485485
} else {
486486
// ptrace(GETSIGINFO) failed (but not due to group-stop).
487487

488-
// A return value of ESRCH means the thread/process is no longer on the
489-
// system, so it was killed somehow outside of our control. Either way,
490-
// we can't do anything with it anymore.
491-
492-
// Stop tracking the metadata for the thread since it's entirely off the
493-
// system now.
494-
StopTrackingThread(thread);
488+
// A return value of ESRCH means the thread/process has died in the mean
489+
// time. This can (e.g.) happen when another thread does an exit_group(2)
490+
// or the entire process get SIGKILLed.
491+
// We can't do anything with this thread anymore, but we keep it around
492+
// until we get the WIFEXITED event.
495493

496494
LLDB_LOG(log,
497-
"GetSignalInfo failed: {0}, tid = {1}, status = {2}, "
498-
"status = {3}, main_thread = {4}",
499-
info_err, thread.GetID(), status, status, is_main_thread);
500-
501-
if (is_main_thread) {
502-
// Notify the delegate - our process is not available but appears to
503-
// have been killed outside our control. Is eStateExited the right
504-
// exit state in this case?
505-
SetExitStatus(status, true);
506-
SetState(StateType::eStateExited, true);
507-
} else {
508-
// This thread was pulled out from underneath us. Anything to do here?
509-
// Do we want to do an all stop?
510-
LLDB_LOG(log,
511-
"pid {0} tid {1} non-main thread exit occurred, didn't "
512-
"tell delegate anything since thread disappeared out "
513-
"from underneath us",
514-
GetID(), thread.GetID());
515-
}
495+
"GetSignalInfo({0}) failed: {1}, status = {2}, main_thread = "
496+
"{3}. Expecting WIFEXITED soon.",
497+
thread.GetID(), info_err, status, is_main_thread);
516498
}
517499
}
518500
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
ENABLE_THREADS := YES
2+
CXX_SOURCES := main.cpp
3+
include Makefile.rules
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
"""
2+
This test verifies the correct handling of the situation when a thread exits
3+
while another thread triggers the termination (exit) of the entire process.
4+
"""
5+
6+
import lldb
7+
from lldbsuite.test.decorators import *
8+
from lldbsuite.test.lldbtest import *
9+
import lldbsuite.test.lldbutil as lldbutil
10+
11+
12+
class ConcurrentThreadExitTestCase(TestBase):
13+
14+
mydir = TestBase.compute_mydir(__file__)
15+
NO_DEBUG_INFO_TESTCASE = True
16+
17+
@skipIf(oslist=no_match(["linux"]))
18+
def test(self):
19+
self.build()
20+
exe = self.getBuildArtifact("a.out")
21+
self.runCmd("file " + exe, CURRENT_EXECUTABLE_SET)
22+
self.expect("run", substrs=["exited with status = 47"])
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#include "pseudo_barrier.h"
2+
#include <cstdlib>
3+
#include <thread>
4+
5+
// Use low-level exit functions to achieve predictable timing.
6+
#if defined(__linux__)
7+
#include <sys/syscall.h>
8+
#include <unistd.h>
9+
10+
void exit_thread(int status) { syscall(SYS_exit, status); }
11+
void exit_process(int status) { syscall(SYS_exit_group, status); }
12+
#else
13+
#error Unimplemented
14+
#endif
15+
16+
pseudo_barrier_t g_barrier;
17+
18+
void thread_func() {
19+
pseudo_barrier_wait(g_barrier);
20+
exit_thread(42);
21+
}
22+
23+
int main() {
24+
pseudo_barrier_init(g_barrier, 2);
25+
std::thread(thread_func).detach();
26+
27+
pseudo_barrier_wait(g_barrier);
28+
29+
exit_process(47);
30+
}

0 commit comments

Comments
 (0)