-
Curious how import xarray as xr
import zarr as za
nlat = 700
nlon = 2000
nlev =20
chunks3d={'time':1, 'feature': 1, 'lev': nlev, 'lat': nlat//7, 'lon': nlon//10}
chunks2d={'time':1, 'feature': 1, 'lev': 1, 'lat': nlat//7, 'lon': nlon//10}
compressor = za.Blosc(cname="zstd", clevel=9, shuffle=2)
g3d = 'f3d'
g2d = 'f2d'
store = 'test.zarr'
Append = False
# Each ts in datasets represent a unique variable but is generically called 'varb3d'
# Each varb3d dims: (time=1, feature=1, nlev=nlev, lat=nlat, lon=nlon)
for ts in datasets:
if Append:
ts.chunk(chunks3d).to_zarr(store, group=g3d, append_dim='feature', consolidated=True)
else:
ts.chunk(chunks3d).to_zarr(store, group=g3d, consolidated=True, encoding={'varb3d': {"compressor": compressor}})
Append = True The above code works but the compression level change from 6 to 9 didn't reduce the size by much. So, I'm wondering if the first variable in the loop is actually compressed? It is not intuitive that it is. If I add the compression in initial creation phase of the file, I get a key error. Curious if the whole array is compressed during the append process or will only appended arrays are compressed? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
I don't know what your data source is, but in general, floating point data don't compress well using lossless compression. I would not expect any meaningful size difference between clevel 6 vs 9. |
Beta Was this translation helpful? Give feedback.
I don't know what your data source is, but in general, floating point data don't compress well using lossless compression. I would not expect any meaningful size difference between clevel 6 vs 9.