Metadata requests for a topic with many partitions #3916

travisdowns · 2022-07-19T01:51:47Z

travisdowns
Jul 19, 2022

Consider a scenario where 50k distinct consumer processes using librdkafka consume from a single topic with 50k partitions on it using a consumer group. Since there is a 1:1 ratio of consumers to partitions, each consumer is ultimately only consuming from ~1 topic (with some small variation as consumers die or reenter the group).

However, empirically, the metadata requests ask about all partitions in the topic, so that's 50k partitions * 50k clients = 2.5 billion partitions worth of metadata sent every refresh interval (ignoring entirely additional metadata refreshes when certain conditions occur that might trigger them, like a topic leader change). If each partition takes ~100 bytes (a reasonable value, empirically, with 3 replicas per partition) that's 250 GB of traffic every 300s (by default) or ~6.7 Gbps constant load just from the periodic metadata refreshes.

Is there any way around this? The metadata API does not seem to admit any feature to ask only about a subset of the partitions in a topic: you may provide a list of topics, but not a list of partitions within those topics. So clients will retrieve all 50k partitions even though they just care about one.

The easy answer is "don't do that [50k partitions in one topic]" but those of us building infrastructure aren't always in the position to choose what users do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata requests for a topic with many partitions #3916

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Metadata requests for a topic with many partitions #3916

Uh oh!

travisdowns Jul 19, 2022

Replies: 0 comments

travisdowns
Jul 19, 2022