Replies: 2 comments 16 replies
-
That the Pod is killed after the termination grace period is expected. This is how it works. Different nodes migth take different time to shutdown - that is at the end also why it is also configurable. But I think shutting down for 15 minutes indicates some kind of issue that you would need to investigate. One of the things that could cause so long shutdowns is when the shutdown is started during a recovery after an unclean shutdown (as Kafka will not shutdown before completing the recovery). This in a way can create a cycle when one unclean shutdown is causing more unclean shutdowns due to the recovery. Your logs do not suggest this is the case as they do not list any of the recovery logs. But they are also not complete. So I cannot say that this is not the issue here for sure. Given you say that you use KRaft and Kafka 3.7.0, I would suggest you to also consider upgrade as KRaft in 3.7 has still many missing features and issues. There is always chance this is some bug that migth be fixed in newer version. |
Beta Was this translation helpful? Give feedback.
-
I have a similar scenario here on GKE (Google Cloud Platform). I started using strimzi 0.38 / Kafka 3.6 with ZK, and the brokers were taking a long time to restart (more than 30 minutes).
This results in a very long restart time. terminationGracePeriodSeconds configured with 300 seconds. I also noticed that while one broker was still starting (not ready), another broker was already shutting down, which, in my case, can lead to topic creation failures. Any initiatives to resolve this scenario? Environment: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I found an unexpected behavior that is not clear to me whether it is Strimzi or Kafka related. I'm testing graceful shutdowns of Kafka brokers as they were taking too long to complete in my Kafka deployment (~15 minutes per broker). After some investigation I found that the brokers graceful shutdown was being forcefully aborted by K8s as it was taking longer than default
terminationGracePeriodSeconds
. I increasedterminationGracePeriodSeconds
to 5 minutes and saw that graceful broker restart reduced to 2-3 minutes (usingkubectl delete <broker>
).Now, I did a rolling update of the Kafka brokers but I'm still seeing each broker taking ~15 minutes to restart. In this case, I've tested triggering rolling updates by changing some Kafka broker configurations and also by adding the strimzi.io/manual-rolling-update annotation.
Any idea why a Kafka broker restart via
kubectl delete
takes 2-3 minutes, whereas if the restart is managed by Strimzi it is still taking 15 minutes?Some details about my setup (let me know if more information is needed):
Kafka brokers have the following configuration overrides (everything else is the default from Strimzi):
Observations of what happens when Kafka brokers take 15 minutes to restart during a rolling restart managed by Strimzi:
terminationGracePeriodSeconds
and get a SIGKILL, which explains why the longer recovery time during start.Thanks.
Beta Was this translation helpful? Give feedback.
All reactions