-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Hey Team :)
We're using the image tag timescale/timescaledb-ha:pg13.16-ts2.15.3
with Patroni and since the last update to that tag (which came with an upgrade to Patroni 4.0.2 and the STOPSIGNAL change from SIGTERM to SIGINT, see #492) the "delete/stop" commands from Kubernetes don't lead to a graceful shutdown anymore. Within the pods / processes nothing happens and then it's forcefully killed after the terminationGracePeriodSeconds
.
Here are the relevant processes running inside the pod for a replica of a three node HA setup
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 1 0.0 0.2 50212 34904 ? Ss 10:00 0:00 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres 15 0.1 0.2 583492 37044 ? Sl 10:00 0:18 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres 384 0.0 0.8 3786888 129736 ? S 10:17 0:00 postgres -D /var/lib/postgresql/data --config-file=/var/lib/postgresql/data/postgresql.conf --listen_addresses=0.0.0.0 --po
postgres 386 0.0 3.7 3787292 600200 ? Ss 10:17 0:09 postgres: xxx-timescaledb-xxx: startup recovering 00000096000032CE00000034
postgres 393 0.0 3.6 3787032 580960 ? Ss 10:17 0:06 postgres: xxx-timescaledb-xxx: checkpointer
postgres 394 0.0 0.2 3786888 37240 ? Ss 10:17 0:00 postgres: xxx-timescaledb-xxx: background writer
postgres 395 0.0 0.0 73260 9108 ? Ss 10:17 0:03 postgres: xxx-timescaledb-xxx: stats collector
postgres 402 0.0 0.1 3790584 28364 ? Ss 10:17 0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres 404 0.0 0.2 3791024 32060 ? Ss 10:17 0:02 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres 2337 0.0 0.1 3790236 28708 ? Ss 12:26 0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres 2349 0.1 0.0 3787784 15152 ? Ss 12:26 0:11 postgres: xxx-timescaledb-xxx: walreceiver streaming 32CE/34C2B6F8
When sending a SIGINT to PID 1 or 15 nothing happens (also simulated this with kill -s SIGINT <PID>
). When looking into the auditd logs of the host machine you see that PID 1 receives the SIGINT but PID 15 doesn't. When sending a SIGTERM everything works as expected.
We first thought it might be a problem with Patroni but the guys over in the Patroni Slack couldn't reproduce it and also our internal tests with this setup https://github.com/patroni/patroni/tree/master/docker confirm, that Patroni works as intended there.
Hope you can help. If you need more information don't hesitate to ask :)
Have a great weekend