Add a drain period to server shutdown (#591)

spladug · web-flow · commit c4a815d2729c · 2021-05-17T09:59:35.000-07:00
When a baseplate-serve process receives a SIGTERM, it puts the server
into a graceful shutdown state immediately. New requests are no longer
accepted and the process continues handling in-flight requests until
they're all complete or stop_timeout seconds have elapsed.

Kubernetes won't mark the pod as TERMINATING until the server has
completed its graceful shutdown or has spent long enough shutting down
that it fails to respond to a liveness healthcheck and is forceably shut
down.

During this intervening time, new requests will still be routed to the
pod but the server will not be listening for them. So they get dropped
on the floor.

To prevent this, we add a drain_time period that happens before the
graceful shutdown is kicked off. If configured, a SIGTERM will cause
baseplate-serve to set a global flag indicating shutdown has begun and
then it will wait the specified time until beginning the actual graceful
shutdown. This gives the application a chance to deliberately fail
READINESS healthchecks during that grace period and get taken out of
rotation so that once it starts graceful shutdown it should not be
getting any new requests.
diff --git a/baseplate/server/__init__.py b/baseplate/server/__init__.py
@@ -15,9 +15,11 @@
 import socket
 import sys
 import threading
+import time
 import traceback
 import warnings
 
+from dataclasses import dataclass
 from types import FrameType
 from typing import Any
 from typing import Callable
@@ -34,6 +36,9 @@
 from baseplate import Baseplate
 from baseplate.lib.config import Endpoint
 from baseplate.lib.config import EndpointConfiguration
+from baseplate.lib.config import Optional as OptionalConfig
+from baseplate.lib.config import parse_config
+from baseplate.lib.config import Timespan
 from baseplate.lib.log_formatter import CustomJsonFormatter
 from baseplate.server import einhorn
 from baseplate.server import reloader
@@ -42,6 +47,14 @@
 logger = logging.getLogger(__name__)
 
 
+@dataclass
+class ServerState:
+    shutting_down: bool = False
+
+
+SERVER_STATE = ServerState()
+
+
 def parse_args(args: Sequence[str]) -> argparse.Namespace:
     parser = argparse.ArgumentParser(
         description=sys.modules[__name__].__doc__,
@@ -237,6 +250,8 @@ def load_app_and_run_server() -> None:
     listener = make_listener(args.bind)
     server = make_server(config.server, listener, app)
 
+    cfg = parse_config(config.server, {"drain_time": OptionalConfig(Timespan)})
+
     if einhorn.is_worker():
         einhorn.ack_startup()
 
@@ -246,13 +261,20 @@ def load_app_and_run_server() -> None:
     # clean up leftovers from initialization before we get into requests
     gc.collect()
 
-    logger.info("Listening on %s, PID:%s", listener.getsockname(), os.getpid())
+    logger.info("Listening on %s", listener.getsockname())
     server.start()
     try:
         shutdown_event.wait()
-        logger.info("Finally stopping server, PID:%s", os.getpid())
+
+        SERVER_STATE.shutting_down = True
+
+        if cfg.drain_time:
+            logger.debug("Draining inbound requests...")
+            time.sleep(cfg.drain_time.total_seconds())
     finally:
+        logger.debug("Gracefully shutting down...")
         server.stop()
+        logger.info("Exiting")
 
 
 def load_and_run_script() -> None:
diff --git a/docs/cli/serve.rst b/docs/cli/serve.rst
@@ -126,6 +126,42 @@ An example command line::
 
 .. _Stripe's Einhorn socket manager: https://github.com/stripe/einhorn
 
+Graceful shutdown
+-----------------
+
+The flow of graceful shutdown while handling live traffic looks like this:
+
+* The server receives a ``SIGTERM`` from the infrastructure.
+* The server sets ``baseplate.server.SERVER_STATE.shutting_down`` to ``True``.
+* If the ``drain_time`` setting is set in the server configuration, the server
+  will wait the specified amount of time before continuing to the next step.
+  This gives your application a chance to use the ``shutting_down`` flag in
+  healthcheck responses.
+* The server begins graceful shutdown. No new connections will be accepted. The
+  server will continue processing the existing in-flight requests until they
+  are all done or ``stop_timeout`` time has elapsed.
+* The server exits and lets the infrastructure clean up.
+
+During the period between receiving the ``SIGTERM`` and the server exiting, the
+application may still be routed new requests. To ensure requests aren't lost
+during the graceful shutdown (where they won't be listened for) your
+application should set an appropriate ``drain_time`` and use the
+``SERVER_STATE.shutting_down`` flag to fail ``READINESS`` healthchecks.
+
+For example:
+
+.. code-block:: py
+
+   def is_healthy(self, context, healthcheck_request):
+       if healthcheck_request.probe == IsHealthyProbe.LIVENESS:
+           return True
+       elif healthcheck_request.probe == IsHealthyProbe.READINESS:
+           if SERVER_STATE.shutting_down:
+               return False
+           return True
+       return True
+
+
 Debug Signal
 ------------