Skip to content

Question on end to end deduplication #143

Answered by PabloPardoGarcia
azhur asked this question in Support
Discussion options

You must be logged in to vote

Hi Azhur, that's a great question. GlassFlow handles deduplication internally, but there are a couple of edge cases where duplicates can still make their way into ClickHouse:

  • If the Kafka sink fails, the message is routed to the dead-letter queue (DLQ). At that point, it's up to the user to reprocess it, and depending on how that's handled, duplicates may be introduced.
  • If the client’s Kafka is down for longer than the deduplication window, we can also introduce duplicates. This is because NATs (used internally for transport) currently checks the time window agains the timestamp when the message reached NATs, not the original Kafka event timestamp. As a result, delayed messages may fall …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by PabloPardoGarcia
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants