xarray to_zarr creates group when var has / in the name, is this defined behavior? #5584
Replies: 1 comment 1 reply
-
@TomNicholas I'm addressing your comment on my HRRRZarr talk from AMS here for lack of a better discussion place. xarray-datatree does read the HRRRZarr when using open_datatree, but the resulting datatree isn't useful since the hrrrzarr is suffering from the issue I documented above, which makes it essentially a malformed xarray dataset. The sort of hierarchy written by xarray as-is puts the coordinates at the root level but the data variable in a sub-subgroup, so reading it with datatree just leaves you with a root dataset with no data and a leaf dataset with no coordinates (example below). While I would welcome an enhancement to datatree that supports inheritance of coordinates from a higher level in the hierarchy if that makes sense, it seems separate from datatree's intended use cases. (I actually don't really see why xarray couldn't be enhanced to read its own datasets of this format anyway, since the slash in the variable name is cosmetic and shows up in the .zmetadata.)
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So say I create a dataframe with a data variable that has a slash in the name:
Naively, I would expect to then be able to load it with open_dataset and get the same dataframe back. Instead, it's created a zarr group and the data is empty:
Of course I can load it with open_mfdataset passing two paths, but that still doesn't yield the original variable name:
I think what I'd expect or want to happen in this situation would be that I either get an error/warning when I try to write a variable with a slash in the name, or that it uses an escape character or something instead of writing a subgroup. Since my team is relying on this behavior now, though, my main concern is whether this is the defined/expected/intended behavior or something that might change in the future.
If it matters, my suggestion would be to retain this behavior but issue a warning that it's creating a zarr subgroup and that xarray doesn't support data hierarchies. The fact that xarray will write data hierarchies but has trouble reading them has been a problem for my team.
Edit:
For an example where it's a problem:
Beta Was this translation helpful? Give feedback.
All reactions