Skip to content

Sharding: Reducing Load on the Filesystem

William Silversmith edited this page Apr 11, 2020 · 28 revisions

Motivation

In the initial implementation of Precomputed, the primary format CloudVolume and Neuroglancer support, very large datasets generate a commensurately large numbers of files. Image datasets, depending on how they are divided up into a regular grid of files ("chunked"), can produce tens or hundreds of millions of files per a layer. Data derived from the labels in the image, such as meshes and skeletons, range into the billions of files as they create a file for every label in every processed region at minimum. This becomes quite taxing on filesystems and object storage systems. In object storage systems, the file metadata are frequently replicated three to five times to ensure integrity via voting.

With cloud storage vendors, this complication is passed onto customers in the form of a cost per file created. It is not immediately clear how closely this cost directly relates to the actual cost of servicing requests, but it is typical as of this writing to cost $5 per million files created and $4 per ten million files read. For file creation, the costs become substantial at around the 100 million file mark ($500), and serious at the 1 billion file mark ($5,000). The dataset size where these considerations start to become important is somewhere around 100 TB of uint8 images.

Therefore, to remain cost efficient when producing these large datasets, we need a way of compacting files together while retaining random access to individual files.

The Shard File

The shard file format is a container for arbitrary files, but typically used for images, meshes, and skeletons. It works by grouping together a set of files based on a hash of their file name and storing them such that they can be read in isolation using partial file reads using byte offsets. The individual file is located by consulting two levels of indices. The first index is located at the beginning of the file and is of fixed size so that a reader will know beforehand how to access it. This index read points to a second index containing an arbitrarily sized listing of segment ids and byte ranges. The third read accesses the data required directly.

See the sharding specification for implementation details.

References

  1. J. Maitin-Shepard. Sharded Format Specification. Github. Accessed April 2020. (link)
Clone this wiki locally