Skip to content

SIGINT not honored by Patroni anymore #499

@talpa-robin

Description

@talpa-robin

Hey Team :)

We're using the image tag timescale/timescaledb-ha:pg13.16-ts2.15.3 with Patroni and since the last update to that tag (which came with an upgrade to Patroni 4.0.2 and the STOPSIGNAL change from SIGTERM to SIGINT, see #492) the "delete/stop" commands from Kubernetes don't lead to a graceful shutdown anymore. Within the pods / processes nothing happens and then it's forcefully killed after the terminationGracePeriodSeconds.

Here are the relevant processes running inside the pod for a replica of a three node HA setup

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres       1  0.0  0.2  50212 34904 ?        Ss   10:00   0:00 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres      15  0.1  0.2 583492 37044 ?        Sl   10:00   0:18 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres     384  0.0  0.8 3786888 129736 ?      S    10:17   0:00 postgres -D /var/lib/postgresql/data --config-file=/var/lib/postgresql/data/postgresql.conf --listen_addresses=0.0.0.0 --po
postgres     386  0.0  3.7 3787292 600200 ?      Ss   10:17   0:09 postgres: xxx-timescaledb-xxx: startup recovering 00000096000032CE00000034
postgres     393  0.0  3.6 3787032 580960 ?      Ss   10:17   0:06 postgres: xxx-timescaledb-xxx: checkpointer 
postgres     394  0.0  0.2 3786888 37240 ?       Ss   10:17   0:00 postgres: xxx-timescaledb-xxx: background writer 
postgres     395  0.0  0.0  73260  9108 ?        Ss   10:17   0:03 postgres: xxx-timescaledb-xxx: stats collector 
postgres     402  0.0  0.1 3790584 28364 ?       Ss   10:17   0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres     404  0.0  0.2 3791024 32060 ?       Ss   10:17   0:02 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres    2337  0.0  0.1 3790236 28708 ?       Ss   12:26   0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres    2349  0.1  0.0 3787784 15152 ?       Ss   12:26   0:11 postgres: xxx-timescaledb-xxx: walreceiver streaming 32CE/34C2B6F8

When sending a SIGINT to PID 1 or 15 nothing happens (also simulated this with kill -s SIGINT <PID>). When looking into the auditd logs of the host machine you see that PID 1 receives the SIGINT but PID 15 doesn't. When sending a SIGTERM everything works as expected.

We first thought it might be a problem with Patroni but the guys over in the Patroni Slack couldn't reproduce it and also our internal tests with this setup https://github.com/patroni/patroni/tree/master/docker confirm, that Patroni works as intended there.

Hope you can help. If you need more information don't hesitate to ask :)

Have a great weekend

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions