Skip to content

Optimize look-write-rename for writing blocks #181

@sourcefrog

Description

@sourcefrog

Yep, it does currently

  1. See if the block is already present, in which case we don't need to do the work to compress it.
  2. Write to a temporary file, so that if the process is interrupted we won't be left with an incomplete file under the final name. (This is not 100% guaranteed by the filesystem, but it's the usual pattern.)
  3. Rename into place.

This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.

A few options:

  1. Just issue more parallel IO.
  2. Remember which blocks are referenced by the basis index: we can already assume they're present and should not need to check. (The most common case, of an unchanged file, does not check, but there might be other edge cases. This should be pretty rare.)
  3. Similarly, remember blocks that we've already seen are present. (Cache in RAM for presence of blocks #106)
  4. If we have a Transport API for the remote filesystem, then in some cases that may already support a reliable atomic write that cannot leave the file half-written. For example this should be possible on S3. Then we don't need the rename.
  5. Even on Unix or Windows maybe a faster atomic write is possible?

Originally posted by @sourcefrog in #177 (comment)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions