Strimzi Entity Operator (Topic Operator) Failing Probes Due to Kafka Connection Issues with KRaft Cluster #11524
Closed
periyasamy003
started this conversation in
General
Replies: 2 comments
-
Hey, you mentioned AMQ Streams, which is a product from RedHat. You should message their support if you have issue with the product. |
Beta Was this translation helpful? Give feedback.
0 replies
-
As Lukas suggested, please contact Red Hat support next time. But in short, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
kafka-crds.zip
[amqstreams.v2.9.0-3] - Operator in Openshift
netpolicy-kafka.zip
Problem Description:
The Strimzi Entity Operator's Topic Operator container was consistently failing its startup and readiness probes, exhibiting two distinct symptoms:
Startup probe failed: ... connect: connection refused
on port8080
(indicating the application either crashed immediately or wasn't listening).Readiness probe failed: HTTP probe failed with statuscode: 500
(indicating the application was listening, but its internal readiness check was failing, almost always due to an inability to connect to Kafka).Troubleshooting Steps & Findings:
Network Policy Review:
NetworkPolicy
files were reviewed:kafka-bt-allow-kafka-egress
,kafka-bt-allow-kafka-ingress
,kafka-bt-allow-entity-operator-egress
,kafka-bt-allow-entity-operator-ingress-self
.53/UDP/TCP
).9094
).6443/TCP
).kafka-bt-allow-dns-netpolicy
was found to be incorrectly configured for port5353/UDP
instead of standard DNS port53/UDP/TCP
. This was corrected as it would affect all pods.Environment Variable Check (Root Cause Identification):
STRIMZI_KAFKA_BOOTSTRAP_SERVERS
environment variable inside the runningkafka-bt-cluster-entity-operator
pod was checked viaoc exec
command.kafka-bt-cluster-kafka-bootstrap:9091
instead of the correctkafka-bt-cluster-kafka-bootstrap:9093
(the internal listener for KRaft).Kafka
Custom Resource (oc get kafka kafka-bt-cluster -n kafka-bt -o yaml
) revealed the underlying problem: Thespec.entityOperator.topicOperator
andspec.entityOperator.userOperator
sections were missing thetemplate.container.env
blocks required to inject this override. Without this, the Entity Operator was defaulting to an incorrect listener port for connection.Solution Applied/Proposed:
Correct Network Policies: Ensure all necessary
NetworkPolicy
resources are correctly applied, specifically:kafka-bt-deny-all-netpolicy
(if applicable).kafka-bt-allow-dns-netpolicy
(allowing port53/UDP/TCP
).kafka-bt-allow-entity-operator-egress
) including Kubernetes API server access.Critical
Kafka
CR Fix:Kafka
Custom Resource (kafka-bt-cluster
) was updated to include thetemplate.container.env
override for bothtopicOperator
anduserOperator
within theentityOperator
section. This explicitly setsSTRIMZI_KAFKA_BOOTSTRAP_SERVERS
tokafka-bt-cluster-kafka-bootstrap:9093
.oc apply
ing a corrected YAML file or byoc edit
ing the resource directly.Expected Outcome:
With the
STRIMZI_KAFKA_BOOTSTRAP_SERVERS
environment variable correctly pointing to9093
and necessary network policies in place, the Entity Operator should now be able to establish a successful connection to the Kafka cluster, resolve the500 Internal Server Error
on its readiness probes, and start functioning correctly.Beta Was this translation helpful? Give feedback.
All reactions