Problem renewing own cluster CA with intermediate and root CA certs #11651

kjbflood · 2025-07-14T17:17:28Z

kjbflood
Jul 14, 2025

Hi, I'm trying to test the process to renew my own Cluster/Clients CA certs using Strimzi 0.45.0 with Kafka 3.9.0.

We use our own intermediat/root CA and want to renew both using the original keys.

For our test, I create a new Kafka cluster, using our current CA secrets, then pause reconcilliation and update the secrets to use the renewed CA certs.

During testing, connection is lost between the Kafka brokers and Zookeeper nodes, and the reconciliation for the Kafka brokers gets stuck, requiring manual broker pod restarts for brokers 2 and 3.

Our current CA cert secrets include our intermediate and root CAs and the ca secrets include the intermediate key. To renew the certs, I renew the root CA cert using the same details as the original, and use the original root CA key. I then use the new root CA cert and original root key, along with the original intermediate key to renew the intermediate cert. After creating new root and intermediate certs, I concatenate the PEM files and base64 encode them, then update the ca-certs secret and update the ca-cert-generation value (to "1").

During the next reconcilliation, Strimzi updates the ZK node certs, then restarts the ZK pods. After the second ZK pod restarts, all broker pods start to have connection issues with ZK, displaying "certificate_unknown" errors:

io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

After it restarts all ZK pods, the operator creates new broker certs, but then cannot connect to the Kafka broker pods, displaying errors such as:

2025-07-14 13:54:45 ERROR NetworkClient:839 - [AdminClient clientId=adminclient-890] Connection to node -2 (kf-kafka-kafka-1.kf-kafka-kafka-brokers.kf.svc.cluster.local/10.245.3.182:9091) failed authentication due to: Failed to process post-handshake messages

Eventually, one broker will be force rolled:

2025-07-14 13:56:53 WARN KafkaRoller:498 - Reconciliation #1330(watch) Kafka(kf/kf-kafka): Pod kf-kafka-kafka-0/0 will be force-rolled, due to error: Error while trying to determine the cluster controller from pod kf-kafka-kafka-0, caused by:Failed to process post-handshake messages

After that, the operator sometimes can restart the other brokers, but more often, encounters reconcilliation errors due to ISR or ForceablePoblem issues:

2025-07-14 15:24:36 INFO KafkaRoller:388 - Reconciliation #307(watch) Kafka(kf/kf-kafka): Will temporarily skip verifying pod kf-kafka-kafka-0/0 is up-to-date due to ForceableProblem: Pod kf-kafka-kafka-0 is the active controller and there are other pods to verify first, retrying after at least 250ms

After the kafka pods are restarted, either automatically or manually, the operator finishes rolling the entity operator and exporter pods and reconcilliation completes.

During the update, a simple client configured with a kafkaUser secret initially starts timing out, then has SSL handshake issues until restarted post-update:

[kafka@kc-kafka-client kafka]$ bin/kafka-topics.sh --bootstrap-server ${CLUSTER}-bootstrap-${NAMESPACE}.service.${REGION_KEY}-dataplane.${REALM}.consul:443 --command-config /tmp/kafka/config/consumer-ssl.properties --list
Error while executing topic command : Timed out waiting for a node assignment. Call: listTopics
[2025-07-14 17:01:04,635] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listTopics
 (org.apache.kafka.tools.TopicCommand)
[kafka@kc-kafka-client kafka]$ bin/kafka-topics.sh --bootstrap-server ${CLUSTER}-bootstrap-${NAMESPACE}.service.${REGION_KEY}-dataplane.${REALM}.consul:443 --command-config /tmp/kafka/config/consumer-ssl.properties --list
[2025-07-14 17:01:29,832] ERROR [AdminClient clientId=adminclient-1] Connection to node -1 (kf-kafka-bootstrap-kf.service.fra-dataplane.dev.consul/10.10.3.120:443) failed authentication due to: Failed to process post-handshake messages (org.apache.kafka.clients.NetworkClient)
[2025-07-14 17:01:29,834] WARN [AdminClient clientId=adminclient-1] Metadata update failed due to authentication error (org.apache.kafka.clients.admin.internals.AdminMetadataManager)
org.apache.kafka.common.errors.SslAuthenticationException: Failed to process post-handshake messages
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:370)
        at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293)
        at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:209)
        at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
        at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)
        at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)
        at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)
        at org.apache.kafka.common.network.SslTransportLayer.read(SslTransportLayer.java:585)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:84)
        at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:462)
        at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:412)
        at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:678)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:580)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:485)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:595)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1524)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1455)
        at java.base/java.lang.Thread.run(Thread.java:840)
Error while executing topic command : Failed to process post-handshake messages
[2025-07-14 17:01:29,928] ERROR org.apache.kafka.common.errors.SslAuthenticationException: Failed to process post-handshake messages
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:370)
        at java.base/sun.security.ssl.Alert$AlertConsumer.consume(Alert.java:293)
        at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:209)
        at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
        at java.base/sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736)
        at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482)
        at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:679)
        at org.apache.kafka.common.network.SslTransportLayer.read(SslTransportLayer.java:585)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:84)
        at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:462)
        at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:412)
        at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:678)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:580)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:485)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:595)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1524)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1455)
        at java.base/java.lang.Thread.run(Thread.java:840)
 (org.apache.kafka.tools.TopicCommand)

FWIW: If I configure the cluster with only the root CA and key, the replacement process works as I'd expect with the new root CA/original key.

My questions:

Is the "own CA cert replacement process with original keys process supported for CA certs that an intermediate and root CA cert
Is the process I used to renew the certs correct?
Should client connections continue to succeed during the update?

I've includeded relevant sections from a broker log and the operator log:
broker.log
operator.log

scholzj · 2025-07-14T17:47:59Z

scholzj
Jul 14, 2025
Maintainer

If this is what you mean with the process:

For our test, I create a new Kafka cluster, using our current CA secrets, then pause reconcilliation and update the secrets to use the renewed CA certs.

Then no, this is not the correct process. Please follow the docs for renewals and replacement of custom CA certificates.

0 replies

kjbflood · 2025-07-14T18:04:03Z

kjbflood
Jul 14, 2025
Author

I did follow the doc. We use the same CA cert secrets for several Kafka clusters. I want to test the process on a test cluster. For this purpose, I created another new cluster with the original secrets, and verified that it works like all our other clusters. I then created the renewed certs and updated the secret ca.crt value and generation as described in the doc.

This is the section I followed:
https://strimzi.io/docs/operators/latest/deploying#renewing-your-own-ca-certificates-str

I paused the reconciliation before editing the secrets as described, then cleared the annotation to allow the reconciliation to proceed. I'm sorry, but I'm not sure what you think I did wrong

7 replies

scholzj Jul 14, 2025
Maintainer

Maybe a start would be to explain what exactly you did -> share the exact steps one by one, including a proper explanation what is the content of which secret etc.

scholzj Jul 14, 2025
Maintainer

(Having the full Kafka custom resource would be likely useful as well)

kjbflood Jul 14, 2025
Author

My original kf-kakfa-clients-ca-cert and kf-kafka-cluster-ca-cert

apiVersion: v1
data:
  ca.crt: <original base64 encoded intermediate and root CA concatenated pem file>
kind: Secret
metadata:
  annotations:
    strimzi.io/ca-cert-generation: "0"
  labels:
    strimzi.io/cluster: kf-kafka
    strimzi.io/kind: Kafka
  name: kf-kafka-cluster-ca-cert
  namespace: kf

We use the same values for both Cluster CA and Client CA, so the secrets are the same apart from the name.

To create the new secrets, I used the original root CA key and openssl commands to renew the root CA cert.
I then used the new root CA, original root key, and original intermediate CA key to renew the intermediate CA cert.
I used openssl verify to verify that the new root and intermediate can validate old certs.

With the new root and intermediate cert PEM values, I concatenated them in the order intermediate, then root, and base64 encoded the concatenated file.

I used this base64 encoded value to update the ca.crt value in the ca-cert secrets, then updated the ca-cert-generation annotation from 0 to 1.

After seeing the generation value change, the operator began the reconciliation process

kjbflood Jul 14, 2025
Author

The kf-kafka-cluster-ca and kf-kafka-clients-ca key secrets were unchanged, keeping the original encoded intermediate key as the ca.key value and the original ca-key-generation value "0"

scholzj Jul 14, 2025
Maintainer

Well, the log suggests that the new certificates cannot validate the old keys. So maybe you need to use the CA replacement process from the next docs chapter.

kjbflood · 2025-07-14T18:35:27Z

kjbflood
Jul 14, 2025
Author

kf-kafka.yaml

kf-kafka.txt

0 replies

katheris · 2025-07-24T14:52:21Z

katheris
Jul 24, 2025
Collaborator

Hey @kjbflood does it work if you follow the steps related to replacing the private key? So specifically when you update the ca.crt value in the ca-cert secrets first take a copy of the old ca.crt and store it in the Secret as YEAR-MONTH-DAYTHOUR-MINUTE-SECONDZ. For example ca-2023-01-26T17-32-00Z.crt. And also update the generation id in the <CLUSTER_NAME>-ca Secret before resuming reconciliation. Even though the root CA key has not changed I wonder if the presence of the intermediate CA is meaning that once the first pod rolls the old pods don't trust it anymore.

If you increment both generation ids, not just the cert one it will do a rolling update to trust the new CA cert, and then do the generation of new certificates in a separate rolling update.

0 replies

Strimzi

Problem renewing own cluster CA with intermediate and root CA certs #11651

Uh oh!

kjbflood Jul 14, 2025

Replies: 4 comments · 7 replies

Uh oh!

scholzj Jul 14, 2025 Maintainer

Uh oh!

kjbflood Jul 14, 2025 Author

Uh oh!

scholzj Jul 14, 2025 Maintainer

Uh oh!

scholzj Jul 14, 2025 Maintainer

Uh oh!

Uh oh!

kjbflood Jul 14, 2025 Author

Uh oh!

kjbflood Jul 14, 2025 Author

Uh oh!

scholzj Jul 14, 2025 Maintainer

Uh oh!

Uh oh!

kjbflood Jul 14, 2025 Author

Uh oh!

katheris Jul 24, 2025 Collaborator

kjbflood
Jul 14, 2025

Replies: 4 comments 7 replies

scholzj
Jul 14, 2025
Maintainer

kjbflood
Jul 14, 2025
Author

scholzj Jul 14, 2025
Maintainer

scholzj Jul 14, 2025
Maintainer

kjbflood Jul 14, 2025
Author

kjbflood Jul 14, 2025
Author

scholzj Jul 14, 2025
Maintainer

kjbflood
Jul 14, 2025
Author

katheris
Jul 24, 2025
Collaborator