Skip to content

Performance Guide FAQ

William Silversmith edited this page Mar 5, 2025 · 4 revisions

CloudVolume has a variety of different subsystems and use cases, each of which may have particular optimizations for performing them at scale. This article makes an attempt to describe how to use CloudVolume optimally by use case.

Downloading a Set of Single Voxels

Naïvely, downloading a single voxel from any data type will be fairly slow. The image chunk must be downloaded, decoded, and then the point extracted. That's a lot of extra work performed in serial.

# Naive slow pseudocode for downloading a set of points

cv = CloudVolume(...)
pts = [ [100, 2012, 1113], [291, 3838, 120], ... ] 

for x,y,z in pts:
   label = cv[x,y,z][0,0,0,0] # slow

Here are four optimizations that will make this process much faster.

  1. Chunk Size Optimization: Use an appropriate chunk size and compression codec when writing the image.
  2. Concurrency: Download the required chunks in parallel.
  3. Caching: Retain frequently used chunks locally either in-memory or on-disk.
  4. Efficient Decoding: Exploit the structure of the compressed file to avoid unnecessary decoding work.

Unless you are willing to re-write the volume, chunk size optimization is not available to most users so I'll stick to a discussion of the other tactics.

cv.scattered_points implements concurrency, caching, and efficient decoding when available. It uses multiple threads to fetch image chunks. If you set a cache, it will be used to avoid re-downloading from the same point, and if the underlying format supports it, efficient decoding is used.

cv = CloudVolume(..., lru_bytes=int(200e6)) # in-memory Least Recently Used cache
pts = [ ... ]
results = cv.scattered_points(pts)

If you don't specify it up-front, the LRU will not be used. The LRU stores the image chunks in encoded form but with the bitstream compression codec stripped (e.g. gzip, brotli). This avoids some repeated decompression work and allows for exploiting the structure of some encodings. However, for raw encoding, a large amount of memory will be used for each chunk. Efficient decoding is supported for raw (trivial), compressed_segmentation (high efficiency), crackle (z-efficient), and compresso (z-efficient). high efficiency means a single voxel can be extracted without additional work. z-efficient means only a single z-slice needs to be decoded (so a 128x128x64 chunk would be decoded to a first approximation 64x faster). With additional work, it could be extended to fpzip, zfpc, jpeg, and jxl (at least under some settings).

You can improve decoding speed at the expense of memory usage by specifying CloudVolume(..., lru_encoding='raw'). You can also improve LRU capacity at the slight expense of decoding speed and at the cost of a recompression cycle by specifying lru_encoding='crackle'. Of course, both of these tips assume that the native encoding is something else (e.g. cseg, which is most common).

It may be advantageous to sort your points before submitting them to cv.scattered_points so that nearby points are relatively closer to each other in linear sequence. This will take better advantage of the LRU. Sorting based on a single dimension is often enough, but in some situations, it may make sense to reach for DBSCAN (an efficient implementation can be found here: https://github.com/wangyiqiu/dbscan-python). Better clustering allows you to use a smaller LRU.

For graphene volumes, it is even more recommended to use scattered_points because the additional decoding request to the PyChunkGraph server will be batched.

Clone this wiki locally