Replies: 1 comment
-
|
@shoyer I really like this approach, and I'd encourage people to continue this line of thinking. But there are a few details I want to bring up. The complex scenario on writes is when you have distributed writers, writing concurrently on multiple different sessions. This would be a scenario where two or more users start different distributed writes, using, for example, the same parent snapshot_id. In this scenario, we cannot let the written chunks overwrite each other, each writter needs the guarantee that their chunks will be safe in object store before the commit starts. So, as a minimum, we would need to give each writer a "safe" prefix (or hash function, encoding, etc.) to put their chunks in. One trivial way to do that would be to make the manifest a table like (I'm skipping over several details, like the node id, for clarity)
That is in fact a small optimization, because writer id would neew fewer bytes than chunk id. But not a huge optimization, we still need to list chunks. Other approach would be to try to create an efficient way to encode the list of chunks coordinates that a writer wrote. With that we would go from one row per chunk to one row per chunk range. This is trivial for certain patterns of distributed writes, but not for the general case. I wanted to add two extra pieces of context.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
As I understand it, Icechunk current stores references to chunks with something like the following:
This will struggle to scale for very large datasets like those in ARCO-ERA5, which use 10s of millions of chunks.
Instead, what about storing records of writes with corresponding ranges of chunks, e.g.,
Then raw chunks could be stored at
/{version}/{chunk_id}or/{version}/{hash(chunk_id)}. This would keep metadata files quite small, so they could still be loaded into memory even for very large stores.I think this would work well for the typical update patterns we encounter for weather/climate data, where we have O(1) writes per day, and which update contiguous ranges of chunks, e.g., "append one weather forecast" or "update the last few days of data with additional quality controls"
Beta Was this translation helpful? Give feedback.
All reactions