Replies: 1 comment
-
If records are written before a flush how do we ensure that they are not duplicated in the case of a failure before flush? A restart would then, presumably, write the same records again resulting in duplicate records in storage. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The current implementation buffers records, and writes them based on a flush
A more appropriate approach is IMHO to stream the records to file
In this model the
RecordGrouper
would not collect the records in the buffer, but indicate the logical stream of where the records should be written, and the caller ( the task) manages how the records are handled - added to a "logical stream"This opens up several possibilities and improvements
other possibilities
We can use
requestFlush()
when files have been written to update the metrics, rather than relying on timeouts and flushing everythingIssues
preCommit
actionA simple implementation of this is in #319, but this also has some other changes
Beta Was this translation helpful? Give feedback.
All reactions