Replies: 4 comments 7 replies
-
If this is what you mean with the process:
Then no, this is not the correct process. Please follow the docs for renewals and replacement of custom CA certificates. |
Beta Was this translation helpful? Give feedback.
-
I did follow the doc. We use the same CA cert secrets for several Kafka clusters. I want to test the process on a test cluster. For this purpose, I created another new cluster with the original secrets, and verified that it works like all our other clusters. I then created the renewed certs and updated the secret ca.crt value and generation as described in the doc. This is the section I followed: I paused the reconciliation before editing the secrets as described, then cleared the annotation to allow the reconciliation to proceed. I'm sorry, but I'm not sure what you think I did wrong |
Beta Was this translation helpful? Give feedback.
-
kf-kafka.yaml |
Beta Was this translation helpful? Give feedback.
-
Hey @kjbflood does it work if you follow the steps related to replacing the private key? So specifically when you update the ca.crt value in the ca-cert secrets first take a copy of the old ca.crt and store it in the Secret as YEAR-MONTH-DAYTHOUR-MINUTE-SECONDZ. For example ca-2023-01-26T17-32-00Z.crt. And also update the generation id in the If you increment both generation ids, not just the cert one it will do a rolling update to trust the new CA cert, and then do the generation of new certificates in a separate rolling update. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm trying to test the process to renew my own Cluster/Clients CA certs using Strimzi 0.45.0 with Kafka 3.9.0.
We use our own intermediat/root CA and want to renew both using the original keys.
For our test, I create a new Kafka cluster, using our current CA secrets, then pause reconcilliation and update the secrets to use the renewed CA certs.
During testing, connection is lost between the Kafka brokers and Zookeeper nodes, and the reconciliation for the Kafka brokers gets stuck, requiring manual broker pod restarts for brokers 2 and 3.
Our current CA cert secrets include our intermediate and root CAs and the ca secrets include the intermediate key. To renew the certs, I renew the root CA cert using the same details as the original, and use the original root CA key. I then use the new root CA cert and original root key, along with the original intermediate key to renew the intermediate cert. After creating new root and intermediate certs, I concatenate the PEM files and base64 encode them, then update the ca-certs secret and update the ca-cert-generation value (to "1").
During the next reconcilliation, Strimzi updates the ZK node certs, then restarts the ZK pods. After the second ZK pod restarts, all broker pods start to have connection issues with ZK, displaying "certificate_unknown" errors:
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
After it restarts all ZK pods, the operator creates new broker certs, but then cannot connect to the Kafka broker pods, displaying errors such as:
2025-07-14 13:54:45 ERROR NetworkClient:839 - [AdminClient clientId=adminclient-890] Connection to node -2 (kf-kafka-kafka-1.kf-kafka-kafka-brokers.kf.svc.cluster.local/10.245.3.182:9091) failed authentication due to: Failed to process post-handshake messages
Eventually, one broker will be force rolled:
2025-07-14 13:56:53 WARN KafkaRoller:498 - Reconciliation #1330(watch) Kafka(kf/kf-kafka): Pod kf-kafka-kafka-0/0 will be force-rolled, due to error: Error while trying to determine the cluster controller from pod kf-kafka-kafka-0, caused by:Failed to process post-handshake messages
After that, the operator sometimes can restart the other brokers, but more often, encounters reconcilliation errors due to ISR or ForceablePoblem issues:
2025-07-14 15:24:36 INFO KafkaRoller:388 - Reconciliation #307(watch) Kafka(kf/kf-kafka): Will temporarily skip verifying pod kf-kafka-kafka-0/0 is up-to-date due to ForceableProblem: Pod kf-kafka-kafka-0 is the active controller and there are other pods to verify first, retrying after at least 250ms
After the kafka pods are restarted, either automatically or manually, the operator finishes rolling the entity operator and exporter pods and reconcilliation completes.
During the update, a simple client configured with a kafkaUser secret initially starts timing out, then has SSL handshake issues until restarted post-update:
FWIW: If I configure the cluster with only the root CA and key, the replacement process works as I'd expect with the new root CA/original key.
My questions:
I've includeded relevant sections from a broker log and the operator log:
broker.log
operator.log
Beta Was this translation helpful? Give feedback.
All reactions