But, wait, isn't it inefficient to load tiny chunks? [Dask recommends chunk sizes between 100 MB and 1 GB](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes)! Modern SSDs are turning the tables: modern SSDs can sustain over 1 million input/output operations per second. And cloud storage looks like it is speeding up (for example, see the recent announcement of [AWS Express Zone One](https://aws.amazon.com/blogs/aws/new-amazon-s3-express-one-zone-high-performance-storage-class/); and there may be [ways to get high performance from existing cloud storage buckets](https://github.com/JackKelly/light-speed-io/issues/10), too). One reason that Dask recommends large chunk sizes is that Dask's scheduler takes on the order of 1 ms to plan each task. LSIO's data processing should be faster (see below).
0 commit comments