thincow enables non-destructive, copy-on-write edits of block devices involving copying/moving a lot of data around, using deduplication for minimal additional disk usage.
Use cases include:
- Simple-to-use deduplication layer
- Trying dangerous operations on filesystems, such as invasive repair or conversion
- Reshaping storage volumes or filesystems in order to prepare to move them (block-wise) to new media
- Thinly provisioned Copy-on-Write store
- Inline block-based deduplication
- Extent (B-tree) based data representation
- Fault tolerant (errors from upstream devices are propagated to FUSE client)
- Optional incremental flushing
- Optional integrity verification
-
Create a directory containing the upstream device(s), which will contain the initial data. You can use symlinks to add block devices from
/dev, or sparse files to create virtual null-backed devices. -
Create data (and optionally metadata) directories for
thincow's use. -
Create an empty directory to serve as the
thincowmount point. -
Run
thincow:$ ./thincow \ --upstream=path/to/upstream/dir \ --data-dir=path/to/data/dir \ path/to/mount-pointSee the man page or run
thincow --helpto get a full description of all options. -
You can now access and modify the data in the files under the specified mount-point. Modifications will be deduplicated and stored to
thincow's data files. -
To flush changes back to the upstream devices, you can run
echo full > path/to/mount-point/flush. See Flushing for details. -
When done, run
fusermount -u path/to/mount-pointto shut downthincowgracefully.
thincow takes as input (the --upstream parameter) a directory containing the "upstream" block devices (or symlinks to them, or plain files).
These are never written to, and thincow assumes that they won't change for the lifetime of its data/metadata storage.
The --data-dir and --metadata-dir parameters indicate where to save modified data. (Files in --metadata-dir are smaller and may thus be placed on faster media.)
See the Data Files section for a description of each file.
Finally, thincow allows accessing and manipulating the above via a FUSE filesystem, specified as a target directory parameter.
The exposed filesystem contains a devs directory, whose file list mirrors that of the upstream device directory.
Reading from these files initially produces the same contents as on the upstream devices. When a block is read for the first time, thincow makes note of its contents so that it can recognize when a block with the same contents is written back elsewhere later.
Writing to these files will cause thincow to deduplicate blocks according to its hash table of the blocks it has seen so far:
- If a block was seen (during reading) on an upstream device, then it is mapped to said block on the upstream device.
- Otherwise, if it matches a block that's already in the COW store, it is mapped to that block (and its reference counter is incremented).
- Otherwise, it is saved as a new block to the COW store.
-
thincowcan be used purely for deduplication of new data by placing a large empty (sparse) file (e.g. usingtruncateordd seek=...) in the upstream directory. All writes to the file will be saved and deduplicated to the COW store, in thincow's data directory. -
thincowcan be stopped and resumed using the same data / metadata store, assuming it is always stopped gracefully (umountorfusermount -u). Indicating an already populated data/metadata directory with--data-dir/--metadata-dirwill causethincowto use the data there. The underlying (upstream) devices must also not change between invocations. -
It is not safe to resume after an ungraceful termination or power failure. See Freezing for instructions for how to safely create periodic backups during long-running operations.
For a small practical example, see the t/t-9_btrfs-1_raid10_to_single.sh test.
The goal of the test is to model converting a btrfs RAID10 filesystem of four devices to a single, larger device with a single profile.
-
In the test, we set up the btrfs filesystem (consisting of four 128 MB devices) and fill it with 150 MB of data.
-
Then, we add an empty file to serve as a logical target for the destination device.
-
After starting
thincow, the four source devices are mounted, then the target device is added to the filesystem. -
After a balance, the four original devices can be removed from a filesystem, effectively forcing the data to be moved to the new device.
-
At the end, we verify that less than 2 MB of real disk space was used for the above conversion, despite the operation moving around at least 150 MB of data. This can also be seen by checking
stats-full.txtunder the mount point:Block size: 4096 Total blocks: 212992 (872415232 bytes) B-tree nodes: 36 (147456 bytes) Current B-tree root: 1 Devices: Device #0: "single", 335544320 bytes, first block: 0 Device #1: "d", 134217728 bytes, first block: 81920 Device #2: "c", 134217728 bytes, first block: 114688 Device #3: "b", 134217728 bytes, first block: 147456 Device #4: "a", 134217728 bytes, first block: 180224 Hash table: 16384 buckets (1048576 bytes), 8 entries per bucket (64 bytes) Hash table occupancy: 38876/131072 (29%) 0 slots: ##############.......................... 1521 buckets 1 slots: ##################################...... 3658 buckets 2 slots: ######################################## 4249 buckets 3 slots: ###############################......... 3399 buckets 4 slots: ###################..................... 2032 buckets 5 slots: #########............................... 959 buckets 6 slots: ###..................................... 410 buckets 7 slots: #....................................... 108 buckets 8 slots: ........................................ 48 buckets Blocks written: 42015 total, 41049 (168136704 bytes, 98%) deduplicated COW store: 420 blocks (1720320 bytes), 0 free for reuse Total COW references: 1058 blocks (4333568 bytes) Disk space savings: Deduplicated to upstream: 40059 blocks (164081664 bytes) Deduplicated to COW store: 639 blocks (2617344 bytes) Total: 40698 blocks (166699008 bytes)
A mounted thincow filesystem will contain the following entries:
/devs/- directory containing the virtual block devices. Writes to these will be deduplicated./debug/- directory containing virtual files which produce a dump of internal data structures. Note that these can be very large and take a long time to produce for large instances./stats.txt- text file containing information about thethincowinstance. This file only contains information which can be obtained quickly (O(1))./stats-full.txt- text file containing information about thethincowinstance. This file contains all information, and may take a long time to produce for large instances./flush- control file to facilitate flushing.
thincow creates the following files in its data / metadata directories:
blockmapis a B-tree holding information for where a block is stored. Information is organized into extents (a continuous range of blocks). Each extent can point to one of:- An extent from the upstream device (not necessarily at the same position as this element).
- An extent in the COW store, for blocks containing data which was never seen on an upstream device.
hashtablecontains a hash table of all blocks seen so far. Elements have the same format as inblockmap.cowmapcontains an index for the COW block store. It is used to store a reference count for every block, so thatthincowknows when a COW block can be freed.globalsis a fixed-length record containing some global variables, such as the B-tree size.cowdata(stored in the--data-dirdirectory) holds the contents of COW blocks (blocks containing data which was never seen on an upstream device). Each block's contents is unique (insofar as the hash table capacity allows).checksumswill contain the per-block checksums, if--checksum-bitsis enabled.
Note that the above files are created as large sparse files, big enough to accommodate the worst case, but initially don't consume any real disk space.
thincow exposes a /flush control file to facilitate writing changes back to the upstream devices.
This allows using thincow to preview the outcome of a potentially destructive operation:
perform the operation, examine the result, and then flush the changes back to the physical device.
Reading from /flush produces one of the following strings:
clean- there are no changes to flush.ready- there are changes, andthincowis ready to flush them.error, followed by a newline, followed by an error message - the last flush operation failed due to an error.disabled- flushing is disabled becausethincowwas run with--read-only-upstream.
Flushing is done by writing a command to /flush. Accepted commands are:
full- flush everything in one go.max- flush one pass. (Multiple passes may be necessary in case of data dependencies.)- decimal number - flush at most this many blocks (block size is 512 by default).
When thincow receives the SIGTSTP signal, it waits for the current operation to finish and clears the dirty bit before suspending. This allows safely saving a snapshot of its data and metadata without stopping thincow or unmounting any filesystems mounted on top of thincow.
-
Start
thincowwith the--pid-fileoption.$ thincow ... --pid-file=thincow.pid ... -
To freeze thincow:
-
First, you may want to sync / freeze / unmount any filesystems running on top of thincow, starting from inner-most to outer-most.
-
Send
SIGTSTPto the PID contained in the PID file specified by the--pid-fileoption.$ kill -TSTP $(cat thincow.pid) -
Wait until the thincow process enters the
T (stopped)state.$ while [ "$(awk '/^State:/{print $2}' /proc/$(cat thincow.pid)/status)" != T ] ; do sleep 1 ; done -
It is now safe to make a copy of thincow's data and metadata directories.
-
To resume thincow, send
SIGCONTin the same way:$ kill -CONT $(cat thincow.pid) -
Unfreeze / remount any filesystems that were running on top of thincow.
-
- Install a D compiler
- Install Dub, if it wasn't included with your D compiler
- Run
dub build -b release
thincow is available under the Mozilla Public License, v. 2.0.