Strimzi Kakfa pod is terminating and again gets active after certain period of time #11581
Unanswered
vaibhav0401
asked this question in
Q&A
Replies: 2 comments 21 replies
-
Please keep the discussion on one place. As I mentioned on the Slack: Hey, this is not much info about the problem. There is some connection issue it seems, but what was before this was happening? Did you do some upgrade? Did you change anything? Are you using ZK or KRaft? What is the operator version? |
Beta Was this translation helpful? Give feedback.
21 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, i have 5 of the kafka pod out of which either of the pod is terminating and getting up after certain period of time, but causing the application to gets crashed. Even though 24 GB of JVM memory has been provided to it.
operatorLastSuccessfulVersion: 0.38.0
Error -
2025-06-24 02:18:10,052 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=790042141, epoch=INITIAL) to node 3: (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:18:10,053 WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={fm-ingestion-0=PartitionData(topicId=iILVc1EqSpW5sLdVu7toIw, fetchOffset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[156], lastFetchedEpoch=Optional.empty)}, isolationLevel=READ_UNCOMMITTED, removed=, replaced=, metadata=(sessionId=790042141, epoch=INITIAL), rackId=) (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:18:41,072 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Client requested connection close from node 3 (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-3]
2025-06-24 02:18:41,073 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=790042141, epoch=INITIAL) to node 3: (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:18:41,073 WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={fm-ingestion-0=PartitionData(topicId=iILVc1EqSpW5sLdVu7toIw, fetchOffset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[156], lastFetchedEpoch=Optional.empty)}, isolationLevel=READ_UNCOMMITTED, removed=, replaced=, metadata=(sessionId=790042141, epoch=INITIAL), rackId=) (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:19:12,104 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Client requested connection close from node 3 (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-3]
2025-06-24 02:19:12,104 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=790042141, epoch=INITIAL) to node 3: (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:19:12,105 WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={fm-ingestion-0=PartitionData(topicId=iILVc1EqSpW5sLdVu7toIw, fetchOffset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[156], lastFetchedEpoch=Optional.empty)}, isolationLevel=READ_UNCOMMITTED, removed=, replaced=, metadata=(sessionId=790042141, epoch=INITIAL), rackId=) (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:24:12,964 WARN Client session timed out, have not heard from server in 279157ms for session id 0x301e3a401100050 (org.apache.zookeeper.ClientCnxn) [main-SendThread(strimzi-cluster-zookeeper-client:2181)]
2025-06-24 02:24:12,964 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Client requested connection close from node 3 (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-3]
2025-06-24 02:24:13,077 WARN Session 0x301e3a401100050 for server strimzi-cluster-zookeeper-client/172.30.92.148:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn) [main-SendThread(strimzi-cluster-zookeeper-client:2181)]
org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 279157ms for session id 0x301e3a401100050
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1257)
2025-06-24 02:24:13,078 INFO [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error sending fetch request (sessionId=790042141, epoch=INITIAL) to node 3: (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:24:13,079 INFO [ReplicaFetcher replicaId=2, leaderId=4, fetcherId=0] Disconnecting from node 4 due to request timeout. (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-4]
2025-06-24 02:24:13,080 WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={fm-ingestion-0=PartitionData(topicId=iILVc1EqSpW5sLdVu7toIw, fetchOffset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[156], lastFetchedEpoch=Optional.empty)}, isolationLevel=READ_UNCOMMITTED, removed=, replaced=, metadata=(sessionId=790042141, epoch=INITIAL), rackId=) (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-3]
java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:109)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:24:13,081 INFO [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Partition unit-nims-inventory-feeder-0 has an older epoch (1495) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-0]
2025-06-24 02:24:13,081 INFO [ReplicaFetcher replicaId=2, leaderId=4, fetcherId=0] Cancelled in-flight FETCH request with correlation id 14624 due to node 4 being disconnected (elapsed time since creation: 273504ms, elapsed time since send: 273504ms, request timeout: 30000ms) (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-4]
2025-06-24 02:24:13,082 INFO [ReplicaFetcher replicaId=2, leaderId=4, fetcherId=0] Client requested connection close from node 4 (org.apache.kafka.clients.NetworkClient) [ReplicaFetcherThread-0-4]
2025-06-24 02:24:13,082 INFO [ReplicaFetcher replicaId=2, leaderId=4, fetcherId=0] Error sending fetch request (sessionId=1857628606, epoch=14624) to node 4: (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-4]
java.io.IOException: Connection to 4 was disconnected before the response was read
at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:99)
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:113)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:24:13,082 WARN [ReplicaFetcher replicaId=2, leaderId=4, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, isolationLevel=READ_UNCOMMITTED, removed=, replaced=, metadata=(sessionId=1857628606, epoch=14624), rackId=) (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-4]
java.io.IOException: Connection to 4 was disconnected before the response was read
at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:99)
at kafka.server.BrokerBlockingSender.sendRequest(BrokerBlockingSender.scala:113)
at kafka.server.RemoteLeaderEndPoint.fetch(RemoteLeaderEndPoint.scala:79)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:316)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
2025-06-24 02:24:13,083 INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Partition __consumer_offsets-46 has an older epoch (1359) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaFetcherThread) [ReplicaFetcherThread-0-1]
2025-06-24 02:24:13,084 INFO [GroupCoordinator 2]: Member consumer-importing-granite-data-group-3-15650cb9-d185-44c7-823b-ef30b5dbf843 in group importing-granite-data-group has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [executor-Heartbeat]
2025-06-24 02:24:13,085 INFO [GroupCoordinator 2]: Preparing to rebalance group importing-granite-d
Pod description events
Events:
Type Reason Age From Message
Normal Killing 2m50s kubelet Stopping container kafka
Warning Unhealthy 2m16s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: time="2025-06-24T06:45:47Z" level=fatal msg="nsexec-1[149007]: failed to open /proc/1903671/ns/ipc: No such file or directory"
time="2025-06-24T06:45:47Z" level=fatal msg="nsexec-0[149000]: failed to sync with stage-1: next state: Invalid argument"
time="2025-06-24T06:45:47Z" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1
Warning Unhealthy 106s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: time="2025-06-24T06:46:17Z" level=fatal msg="nsexec-1[160990]: failed to open /proc/1903671/ns/ipc: No such file or directory"
time="2025-06-24T06:46:17Z" level=fatal msg="nsexec-0[160981]: failed to sync with stage-1: next state: Invalid argument"
time="2025-06-24T06:46:17Z" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1
Warning Unhealthy 76s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: time="2025-06-24T06:46:47Z" level=fatal msg="nsexec-1[174221]: failed to open /proc/1903671/ns/ipc: No such file or directory"
time="2025-06-24T06:46:47Z" level=fatal msg="nsexec-0[174214]: failed to sync with stage-1: next state: Invalid argument"
time="2025-06-24T06:46:47Z" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1
Warning Unhealthy 46s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: time="2025-06-24T06:47:17Z" level=fatal msg="nsexec-1[186454]: failed to open /proc/1903671/ns/ipc: No such file or directory"
time="2025-06-24T06:47:17Z" level=fatal msg="nsexec-0[186442]: failed to sync with stage-1: next state: Invalid argument"
time="2025-06-24T06:47:17Z" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1
Warning Unhealthy 16s kubelet Readiness probe errored: rpc error: code = Unknown desc = command error: time="2025-06-24T06:47:47Z" level=fatal msg="nsexec-1[198411]: failed to open /proc/1903671/ns/ipc: No such file or directory"
time="2025-06-24T06:47:47Z" level=fatal msg="nsexec-0[198402]: failed to sync with stage-1: next state: Invalid argument"
time="2025-06-24T06:47:47Z" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1
Thank you in Advance
Beta Was this translation helpful? Give feedback.
All reactions