-
Notifications
You must be signed in to change notification settings - Fork 23
Description
One more thought from #177, cc @road2react and @WolverinDEV:
Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to 1<<12
or 4096 of them. This introduces a blocking mkdir
ahead of writing each block file.
The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.
It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.
There are several options, and in order of priority:
- Remember which subdirectories are known to exist (because we already wrote or saw a block in them) and then there's no need to create them.
- In addition, at the start of a backup, read the block directory to see which prefixes are present and remember them. This has the added benefit of quickly answering whether a given hash can possibly be present.
- Make it tunable so that we can at least experiment with different settings, where 0 means no subdirectories. (It should be stored in some archive metadata. It may not be worth allowing this to be changed once the archive exists.)
I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.