-
Notifications
You must be signed in to change notification settings - Fork 21k
Implemented health probe for cleaner shutdown #31866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't appear that Geth implements these health check endpoints (?). Even if it did, it seems a bit excessive to add a blanket 7 second delay during shutdown.
Here's the logic behind it: We have implemented health check in proxyd (it will be committed upstream later), but it does not make sense to fire it on intervals shorter than 5 seconds. Running longer interval also does not make sense because it increases reaction time and slows down deployment. So 7 seconds is nearly perfect value that will be good enough for everyone, and everyone needs to use health check if they are using any kind of load balancer. Also if people use k8s, there is no guarantee that SIGTERM will arrive after kube proxy removes node from the service. There is a race condition there. If you decide yo run Istio, it can normally have 2-3 second configuration replication lag on top of it. So k8s users will also benefit from this delay. |
We don't want to add the timeout. If you need to coordinate shutdown with your loadbalancer, then your cluster management needs to first take a Geth instance out of the loadbalancer and then send the shutdown signal to it. Any solution implemented in Geth will always be racy and weird. |
The primary purpose of adding the health check was to signal Proxyd. Since Proxyd doesn't offer an API to remove a node from rotation, it currently relies on tracking error rates—meaning a significant number of errors can occur before it reacts. We've implemented a health check to address this, but it's a polling mechanism, with a reaction delay during which traffic must be served still. This feature is essential, and once it's fully tested, we plan to merge the health check code upstream into Proxyd. So this PR is no go or there are some conditions to make it happen? If we make this delay to be optional via arg switch, will that work? |
We don't want to add it the way it is now in the PR. It could work with a flag, like |
k8s has terminationGracePeriodSeconds which is 30 seconds by default, it can be increased no problem. ECS also has 30 second grace period. 7 seconds is not that much on a 30 second scale. We only need to care about docker compose which has 10 second default grace timeout. |
It is not so easy. The flag is a good step, and we will definitely not merge it without the flag. But I'm trying to explore alternative solutions also. For example, we might want to add an |
I'm concerned that this might be difficult to align with the container lifecycle. For example, when you update a container version, you typically revise the ECS task definition. ECS then begins the update process by sending a SIGTERM signal to one of the running containers (we don't know which one will be chosen first). At that point, you have two options:
If the load balancer is integrated into the platform (like ALB or kube-proxy), the first solution is generally manageable. However, when using something like proxyd, which is not platform-native, there’s no built-in mechanism to signal it—aside from the health check we're trying to implement. There’s a reason SIGTERM and SIGKILL are spaced out by 30 seconds (by default): that window is intended to give time to resolve race conditions and ensure a clean shutdown, where the client stops sending requests first and the server stops accepting them only after that. |
This comment was marked as spam.
This comment was marked as spam.
Yes, we will definitely consider merging this when it is behind a flag! But the flag should not be on by default. |
@fjl Added commit making delay to be optional and off by default. Did some testing (in the PR description). |
@fjl Hi Felix! What is the final conclusion? Thanks! |
Problem
When op-geth is getting restarted, we need to signal load balancer to stop sending requests to a node before shutting it down, otherwise we will have errors in the RPC. The proper way to signal load balancer is to implement srver side readiness probe endpoint, which will fail when receiving SIGTERM, allow load balancer to stop sending traffic to the node, then after some reasonable timeout, drain inflight requests and exit.
This PR Implements functionality to
Tests
Testing was done on a local WS by launching op-geth, hitting /readyz on RPC port with fortio, interrupting op-geth with Ctrl-C and watching for readyz probe to fail while op-geth is waiting for 7 seconds before shutting down the server.
fortio load -c 1 -qps 1 -allow-initial-errors -n 10000 http://127.0.0.1:8551/readyz
op-geth output:
Additional context
UPDATE
more testing after making shutdown delay to be optional.
build/bin/geth --shutdown-delay 10s