Skip to content

Commit 5b6ef14

Browse files
committed
Improve aborting daemon upon start timeout
- Upon start timeout, abort daemon in two stages: first abort gracefully (SIGTERM); if that times out (stop_abort_timeout), then abort forcefully (SIGKILL). Previously, we aborted the daemon by sending SIGTERM but without checking whether it actually terminates. - Make timeout handling more robust. Instead of timing out at any point in the code (which could leave behind inconsistent state), only allow timing out in well-defined interruption points. - Much improve test coverage.
1 parent a63109d commit 5b6ef14

File tree

9 files changed

+644
-432
lines changed

9 files changed

+644
-432
lines changed

.github/workflows/ci.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ jobs:
2626
bundler-cache: true
2727

2828
- name: Run RSpec tests
29-
run: bundle exec rake test
29+
run: /usr/bin/timeout -s QUIT 120 bundle exec rake test
30+
timeout-minutes: 3
3031
env:
3132
MRI_RUBY: "env RUBYOPT= RUBYLIB= /usr/bin/ruby"
3233

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -500,4 +500,5 @@ API documentation
500500

501501
Detailed API documentation is available here:
502502
- [Configuration options](doc/OPTIONS.md)
503+
- [Stop flow](doc/STOP_FLOW.md)
503504
- Inline comments in `lib/daemon_controller.rb`.

doc/OPTIONS.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ Lock file to use for serializing concurrent daemon management operations.
7171
Command to stop the daemon with, e.g. "/etc/rc.d/nginx stop".
7272

7373
If no stop command is given (i.e., `nil`), then will stop the daemon
74-
by sending signals to the PID written in the PID file.
74+
by sending signals to the PID written in the PID file. See
75+
[Stop flow](STOP-FLOW.md) for how that works.
7576

7677
### restart_command (default: nil)
7778

@@ -89,7 +90,15 @@ The Proc call is not subject to the start timeout.
8990
Maximum amount of time (seconds) that #start may take to start
9091
the daemon. Since #start also waits until the daemon can be connected to,
9192
that wait time is counted as well. If the daemon does not start in time,
92-
then #start will raise an exception and also stop the daemon.
93+
then #start will raise an exception and also stop the daemon. See also
94+
[Stop flow](STOP-FLOW.md) for how that works.
95+
96+
### start_abort_timeout (default: 10)
97+
98+
Maximum amount of time (seconds) to wait for the daemon to terminate after
99+
sending SIGTERM during the start timeout flow. If the daemon does not terminate
100+
within this time, it will be forcefully terminated with SIGKILL. See
101+
[Stop flow](STOP-FLOW.md) for more details.
93102

94103
### stop_timeout (default: 30)
95104

@@ -98,6 +107,8 @@ the daemon. Since #stop also waits until the daemon is no longer running,
98107
that wait time is counted as well. If the daemon does not stop in time,
99108
then #stop will raise an exception and force stop the daemon.
100109

110+
See [Stop flow](STOP-FLOW.md) for al overview of the entire stopping flow.
111+
101112
### log_file_activity_timeout (default: 10)
102113

103114
Once a daemon has gone into the background, it will become difficult to
@@ -117,6 +128,12 @@ given by this option, then the daemon is assumed to have terminated with an erro
117128
Time interval (seconds) between pinging attempts (see `ping_command`) when waiting
118129
for the daemon to start.
119130

131+
### stop_graceful_signal (default: "TERM")
132+
133+
Signal to send to the daemon when attempting to stop it gracefully during `#stop`
134+
(regular stop flow). This is only used when no `stop_command` is provided.
135+
See [Stop flow](STOP-FLOW.md) for more details.
136+
120137
### dont_stop_if_pid_file_invalid (default: false)
121138

122139
If the stop_command option is given, then normally daemon_controller will

doc/STOP-FLOW.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Stop flow
2+
3+
## Regular stop flow
4+
5+
This flow is triggered upon `#stop` calling stop.
6+
7+
1. Graceful termination request: either run the stop command, or send `stop_graceful_signal` to the PID. Then wait until daemon is gone.
8+
- Stop command is only invoked, or signal is only sent, if the PID file is valid or `dont_stop_if_pid_file_invalid` is true.
9+
2. Force termination upon timeout (`stop_timeout`): send SIGKILL to the PID. Then wait until daemon is gone, then delete the PID file.
10+
- No timeout here: we assume the OS processes it quickly enough.
11+
12+
## Start timeout stop flow
13+
14+
This flow is triggered when `#stop` times out.
15+
16+
1. Graceful termination request: send SIGTERM to the PID and wait until it's gone.
17+
- No possibility to customize signal here. Rationale: this is an abnormal stop so we don't use the stop command.
18+
2. Force termination upon timeout (`start_abort_timeout`) of step 1: send SIGKILL to the PID and wait until it's gone.
19+
- No timeout here: we assume the OS processes a SIGKILL quickly enough.

0 commit comments

Comments
 (0)