Writing a Zarr from multiple processes or threads #7125
Unanswered
alexgleith
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey folks
I'm just getting started writing Zarr files, and I have a question about how to handle creating a Zarr from many processes at once.
I have a script over here that takes a bunch of SST NetCDF files and combines them into a single Zarr. This fails on some files, because a thread is writing one of the index files when another wants to, I think. I'll copy the important parts of the code in below.
The basic structure is informed by how I'd like to run this operationally, i.e., if we have a new daily file, we should be able to "process_one_file" and add it to the Zarr. But there might be more than one file, and the initial work is to combine 10 years or so, so 3-4000 files.
I know we can use other methods, like open_mfdataset to do this, but I want to exhaust my options on this idempotent method first! Is there a way we can utilise a shared lock using Xarray's
to_zarr
maybe? Or am I being naive here?!Beta Was this translation helpful? Give feedback.
All reactions