-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
Description
Yep, it does currently
- See if the block is already present, in which case we don't need to do the work to compress it.
- Write to a temporary file, so that if the process is interrupted we won't be left with an incomplete file under the final name. (This is not 100% guaranteed by the filesystem, but it's the usual pattern.)
- Rename into place.
This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.
A few options:
- Just issue more parallel IO.
- Remember which blocks are referenced by the basis index: we can already assume they're present and should not need to check. (The most common case, of an unchanged file, does not check, but there might be other edge cases. This should be pretty rare.)
- Similarly, remember blocks that we've already seen are present. (Cache in RAM for presence of blocks #106)
- If we have a Transport API for the remote filesystem, then in some cases that may already support a reliable atomic write that cannot leave the file half-written. For example this should be possible on S3. Then we don't need the rename.
- Even on Unix or Windows maybe a faster atomic write is possible?
Originally posted by @sourcefrog in #177 (comment)