KafkaConnect rolling restart behavior #11537

swapsCAPS · 2025-06-13T09:00:54Z

swapsCAPS
Jun 13, 2025

Hi all!

I have question on the behavior of Strimzi Kafka Connect rebalances during rolling restarts.

While going through a/the Kafka Connect book I found a section on rebalancing and how it's best to reduce the amount of rebalances in an attempt to reduce cluster strain and amount of stream pauses. It also goes into how you can tweak Connect so that during cluster restarts rebalances don't trigger as long as workers come back up fast enough using scheduled.rebalance.max.delay.ms.

When triggering a rolling restart with Strimzi Connect in a 3 replica cluster (for example by changing a spec.config, or spec.template.pod.annotations value in a KafkaConnect resource) I see that tasks are immediately rebalanced to the other workers the moment one pod terminates. Am I observing this right? Is this expected behavior?
The potential issue that I see with this is that the moment worker 1 restarts, its tasks will get rebalanced to the other workers, which in turn will be restarted in the very near future. Causing what indeed looks like unnecessary rebalances; as work will need to be redistributed a couple of times during the roll.

If this is indeed the expected behavior (and it's not due to user error on my side) I would like to tweak the settings in such a way that a rebalance is not triggered if a worker comes up fast enough.
We are currently not setting scheduled.rebalance.max.delay.ms and assuming it will take the default of 5min. Our pods come up Ready within 90secs, but we are still seeing the behavior described above. The rest of the settings related to rebalancing session.timeout.ms and rebalance.timeout.ms have sane defaults (according to the book and I tend to agree), so I'd rather not mess with those.

Cheers!

TL;DR
What is the expected rebalance behavior during a Strimzi Kafka Connect cluster restart?
Is it possible to prevent unnecessary rebalances when rolling a Strimzi Kafka Connect cluster?

scholzj · 2025-06-14T15:41:32Z

scholzj
Jun 14, 2025
Maintainer

What version of Strimzi are you using? What is your configuration? Logs? Etc. In general, no, I do not think the tasks are expected to be moved to other nodes during rolling updates.

5 replies

swapsCAPS Jun 16, 2025
Author

Using Strimzi 0.45.0 with Kafka 0.45.0-kafka-3.8.1 based image.

Config

# connect-cluster.yaml
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: connect-cluster
  namespace: kafka
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  image: "<snip>:0.45.0-kafka-3.8.1-v12"
  replicas: 2
  bootstrapServers: my-cluster-kafka-bootstrap.kafka:9093
  tls:
    trustedCertificates:
      - secretName: my-cluster-cluster-ca-cert
        pattern: "*.crt"

  jmxOptions: {}
  config:
    config.providers: file,secrets
    config.providers.secrets.class: io.strimzi.kafka.KubernetesSecretConfigProvider
    config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
    group.id: connect-cluster
    offset.storage.topic: connect-offsets.01
    config.storage.topic: connect-configs.01
    status.storage.topic: connect-status.01

    config.storage.replication.factor: -1
    offset.storage.replication.factor: -1
    status.storage.replication.factor: -1
    connector.client.config.override.policy: All

    # https://kafka.apache.org/documentation.html#connect_plugindiscovery
    plugin.discovery: only_scan

  template:
    pod:
      metadata:
        annotations:
          # Bump this value to trigger a restart
          trigger-restart: "2025-06-16T14:43:07"
  logging:
    type: inline
    loggers:
      connect.root.logger.level: INFO

# connectors.yaml
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: mysql-connector-10
  namespace: kafka
  labels:
    strimzi.io/cluster: connect-cluster
spec:
  class: io.confluent.connect.jdbc.JdbcSourceConnector
  tasksMax: 1
  autoRestart:
    enabled: true
  config:
    errors.log.enable: true
    errors.log.include.messages: true
    errors.tolerance: none

    # <snip>

---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: oracle-connector-01
  namespace: kafka
  labels:
    strimzi.io/cluster: connect-cluster
spec:
  class: io.confluent.connect.jdbc.JdbcSourceConnector
  tasksMax: 1
  autoRestart:
    enabled: true
  config:
    errors.log.enable: true
    errors.log.include.messages: true
    errors.tolerance: none
   # <snip>

Log aggregation scripts:

Pods

k logs -f --tail 100000 -n kafka connect-cluster-connect-0 --prefix | grep -E "Kafka Connect worker initializing|Kafka Connect stopped|rebalance" > connect-cluster-connect-0.log
echo "DONE with first part"
sleep 10
k logs -f --tail 100000 -n kafka connect-cluster-connect-0 --prefix | grep -E "Kafka Connect worker initializing|Kafka Connect stopped|rebalance" >> connect-cluster-connect-0.log

Continuous port forwarding

while true; do;
    k -n kafka port-forward svc/connect-cluster-connect-api 8083:8083
    sleep 1
done

Connector status log

rm connector-status.log
while true; do
    now=`date -u "+%Y-%m-%dT%H:%M:%S"`
    echo $now >> connector-status.log
    curl -s 'localhost:8083/connectors?expand=status&expand=info' | jq '. | map_values(.status)' >> connector-status.log
    sleep 5
done

Trigger restart (Am I doing something erroneous here?)

gsed -i "s/trigger-restart.*/trigger-restart: \"$(date -u '+%Y-%m-%dT%H:%M:%S')\"/g" connect-cluster.yaml
k apply -f connect-cluster.yaml

I'm not sure how much use the logs will be. I have grepped fairly aggressively, please let me know if I should share more.
In the connector-status.log it is quite apparent that right after I trigger a restart; worker 1 starts running both tasks.
Restart triggered at 2025-06-16T14:43:07. Worker running both tasks at 2025-06-16T14:43:21 connector-status.log#134

What I also find suspicious is the with rebalance delay: 0 message in the various Joined group at generation messages.

There might be a better way to debug this situation, simple unix tooling is what I resorted to for now, but would be interested in improvement points. I could theoretically spin up a Prometheus to generate some graphs.

connect-cluster-connect-0.log
connect-cluster-connect-1.log
connector-status.log

swapsCAPS Jul 3, 2025
Author

Did some more experiments using the following, instead of updating a template.pod.metadata.annotations value, but no luck

kubectl -n kafka annotate strimzipodsets.core.strimzi.io connect-cluster-connect strimzi.io/manual-rolling-update="true"

After reconciliation triggers, pod-0 restarts, then after a couple of seconds while pod-0 is still restarting pod-1 will start running both tasks with state: "RUNNING". This happens well before the default rebalance max delay setting of 5 minutes.
I would expect no rebalances while pod-0 is restarting (within 5 mins), pod-1 to keep running 1 task, pod-0 to come back up and to continue work for its original tasks. Then for this procedure to repeat for the remaining pods.
Went through the docs thoroughly but can't find anything I might be missing.

I've also tried to pause reconciliation with the following, but then I'm not sure how to trigger a restart anymore 😅

kubectl -n kafka annotate kafkaconnect connect-cluster strimzi.io/pause-reconciliation="true"

scholzj Jul 3, 2025
Maintainer

Please keep in mind that Strimzi is not scheduling any tasks or connectors. That is entirely Kafka Connect's business. So I'm not really sure what your expectations from things such as pausing reconciliations are.

swapsCAPS Jul 9, 2025
Author

I am aware of that. The reason for pausing reconciliation was to potentially decrease the likelihood that the Strimzi operator was trying to (re)start Connectors on the remaining pod. But judging from your answer, this is not something the operator would do.
I don't have time to isolate and debug further, so going to leave it as is and accept the current balancing behavior.
Thanks for the answers!

scholzj Jul 9, 2025
Maintainer

Strimzi would not restart the connector or its tasks unless you enabled the auto-restart feature. But even that would not happen while the operator is rolling the Pods -> it would happen only after the rolling is finished. So I do not think it is Strimzi's interaction with the REST API that triggers this. (unless some rare configuration issues, such as multiple operators trying to operate the same Connect cluster etc. But even then, I would expect it to happen as a race condition rather than something reproducible.)

Strimzi

KafkaConnect rolling restart behavior #11537

Uh oh!

Uh oh!

swapsCAPS Jun 13, 2025

Replies: 1 comment · 5 replies

Uh oh!

scholzj Jun 14, 2025 Maintainer

Uh oh!

Uh oh!

swapsCAPS Jun 16, 2025 Author

Config

Log aggregation scripts:

Uh oh!

Uh oh!

swapsCAPS Jul 3, 2025 Author

Uh oh!

Uh oh!

scholzj Jul 3, 2025 Maintainer

Uh oh!

swapsCAPS Jul 9, 2025 Author

Uh oh!

scholzj Jul 9, 2025 Maintainer

swapsCAPS
Jun 13, 2025

Replies: 1 comment 5 replies

scholzj
Jun 14, 2025
Maintainer

swapsCAPS Jun 16, 2025
Author

swapsCAPS Jul 3, 2025
Author

scholzj Jul 3, 2025
Maintainer

swapsCAPS Jul 9, 2025
Author

scholzj Jul 9, 2025
Maintainer