kafka.Consumer.Pause([]TopicPartition) hangs - rd_kafka_toppars_pause_resume - rd_kafka_q_wait_result(tmpq, RD_POLL_INFINITE) be the cause? #4705
alexseel-a3949
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Reluctant to raise a defect given the lack of others seemingly having this issue. We had this in production where when we paused the kafka consumer it hung - and we had to kill a live service.
looking at LIBRDKAFKA C code rdkafka_partition.c, line 2382 function rd_kafka_toppars_pause_resume looks suspicious as when run synchronously (which it is from the higher level calls) we seem to poll partitions and wait forever for a response.
This fits with what we see.
Will be doing more reproduction tracing with Debug:All enabled to dump the full kafka logs but would like a recommendation for how to Pause the assigned topic partitions robustly even if there is a problem with the topic/brokers/replicas such that this function hangs forever at line 2431:
if (!async) {
while (waitcnt-- > 0)
rd_kafka_q_wait_result(tmpq, RD_POLL_INFINITE);
Would be great if anyone from the community or Confluent team could advise best practice here.
Thanks,
Alex
Beta Was this translation helpful? Give feedback.
All reactions