Skip to content

Performance of blockdir prefix directories #180

@sourcefrog

Description

@sourcefrog

One more thought from #177, cc @road2react and @WolverinDEV:

Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to 1<<12 or 4096 of them. This introduces a blocking mkdir ahead of writing each block file.

The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.

It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.

There are several options, and in order of priority:

  1. Remember which subdirectories are known to exist (because we already wrote or saw a block in them) and then there's no need to create them.
  2. In addition, at the start of a backup, read the block directory to see which prefixes are present and remember them. This has the added benefit of quickly answering whether a given hash can possibly be present.
  3. Make it tunable so that we can at least experiment with different settings, where 0 means no subdirectories. (It should be stored in some archive metadata. It may not be worth allowing this to be changed once the archive exists.)

I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions