Replies: 2 comments
-
If this would be the case, it would show only on startup. If it happens while it is running, that sounds more like some networking issues from this part of the log? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Ah, it turns out my zookeeper pods were just running out of memory. Easy fix! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We are evaluating using Strimzi for running Kafka. Overall it is looking good, but we have a very critical issue that we haven't been able to figure out. Over the lifetime of each of our clusters, Zookeeper will periodically get into a state where every pod is in a CrashLoopBackoff. Eventually they start running, but later they inevitably get back into this state.
The logs are fairly consistent and seem to show issues with Zookeeper pods being unable to connect to each other. This ends in a caught NullPointerException before the process exits with exit code 0. My best guess is that there is an issue with the liveness checks killing pods before all pods are up and able to communicate. I am working on testing this, but I won't know the outcome until its been running for long enough for the issue to reoccur.
We are running Strimzi 0.28.0 with Kafka 3.1.0 on Kubernetes 1.21.2.
Here are the relevant logs, with stack traces trimmed out after the first line.
Beta Was this translation helpful? Give feedback.
All reactions