Kafka upgrade with zero downtime - topic creation not respecting min.insync #11738
Replies: 2 comments 1 reply
-
If your broker startup takes 40 minutes, there is likely some other issue wit your cluster or its infrastructure. So you should look into tha in the first place. It could be storage, networking, or something completely different ... hard to say given the lack of details. |
Beta Was this translation helpful? Give feedback.
-
I know it is not ideal, but even if I reduce the time to 10 or 5 minutes, it still won't be zero downtime. I'm not sure if this is a design choice or limitation, that topic requires all brokers to be available during provisioning. Nevertheless, it doesn't seem to be highlighted anywhere in the documentation about rollouts and zero downtime upgrades which I believe is an oversight. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have an issue with upgrading Kafka and operator. Pods rollout one by one as they should but due to long startup recovery during rollout (about 40 minutes for each replica) I have downtime for topic creation (which is crucial for me as we support dynamic routing and topics are created by apps depending on the current configuration). Entity Operator does respect min.insync replicas during topic creation resulting in topic not being able to provision if one of the replicas is not available at the moment.
I run clusters with 3 replicas. My Kafka config:
All my topics either have
min.insync.replicas: 2
explicitly or rely on inherited configuration from the broker.This is rather problematic as each upgrade (operator+kafka version+protocol version) takes about 8-9 hours on large clusters. Effectivily forcing us to run infra in maintenance mode each time when we upgrade Strimzi.
Is there any way to handle that besides switching to something like 5 nodes cluster with 3 replicas for each topic with min.insync set to 2 which seems huge waste of resources.
Beta Was this translation helpful? Give feedback.
All reactions