Performance of blockdir prefix directories

One more thought from #177, cc @road2react and @WolverinDEV:

Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to `1<<12` or 4096 of them. This introduces a blocking `mkdir` ahead of writing each block file.

The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.

It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.

There are several options, and in order of priority:

1. Remember which subdirectories are known to exist (because we already wrote or saw a block in them) and then there's no need to create them.
2. In addition, at the start of a backup, read the block directory to see which prefixes are present and remember them. This has the added benefit of quickly answering whether a given hash can possibly be present.
3. Make it tunable so that we can at least experiment with different settings, where 0 means no subdirectories. (It should be stored in some archive metadata. It may not be worth allowing this to be changed once the archive exists.)

I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of blockdir prefix directories #180

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance of blockdir prefix directories #180

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions