Thoughts on how Icechunk would handle remote to local replication #979
-
|
Hi, I'm the developer behind Pirate Weather, and I've been looking for a while at a better solution to keeping track of the constantly updating Zarr files that underpin the whole project. In short, as it stands every hour an ingest container runs, converts some forecast GRIB files into a new Zarr, and I overwrite what's on disk via a copy to and from S3. This works, but really isn't elegant, so I've been looking for a better solution for a while. In a lot of ways IceChunk is perfect, and I'm so happy to have stumbled on it! I've done some tests and it's wonderfully drop-in, keeping all the Zarr ideas, which I really appreciate! One thing I was wondering is how the format would react to being replicated from remote (s3) to local storage. If I'm writing to a s3 store in one container, RCloneing the bucket to local in another, and reading that local store in a third, would you expect the format to handle that gracefully? As it stands, raw Zarr files are surprisingly resilient to this sort of setup, but not sure about IceChunk. While reads directly off S3 do work, they're considerably slower (~0.05 seconds) and less consistent compared to almost instant reads off local disks, so would love to get a bit more performance here. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
|
Hi @alexander0042. Glad you are enjoying Icechunk! It is ideal for your use case with frequent updates. A few thoughts:
So, I'd say go ahead and give it a try. Please report back any issues you find. Mandatory disclaimer: Icechunk is really optimized for object store, if you bring the compute close to your bucket it should be very fast. But of course, you cannot compete in latency against local disk. Particularly if your chunks are small. |
Beta Was this translation helpful? Give feedback.
Hi @alexander0042. Glad you are enjoying Icechunk! It is ideal for your use case with frequent updates. A few thoughts:
So, I'd say go ahead and give it a try. Please report back any issues you find.
Mandatory disclaimer: Icechunk is really optimized for object store, if you bring the compute close to your bucket it should b…