How to prevent to_zarr
method in xarray from writing all nan chunks to disk?
#7451
-
I want to save a very large zarr file (2 dimensional) chunked equally along both dimensions (X, X) occationally containing chunks made of all nans. To reduce the amount of chunks written to disk, I want xarray's to_zarr method to skip writing this chunk to disk at all. Here is some code to emulate it: import numpy as np
import xarray as xr
n = 100 # this could get as large as 400K, leaving it small for simplicity
n_chunk = 50 # chunk size
n_delete = 1 # number of random chunks to change to nans
lat = np.linspace(1, 2, n)
lon = np.linspace(1, 2, n)
data = np.random.random((n, n))
all_c = list()
for i in np.arange(n//n_chunk):
for j in np.arange(n//n_chunk):
all_c.append((i, j))
delete = np.array(all_c)[np.random.choice(np.arange(len(all_c)), n_delete)]
print(np.unique(delete, axis=1).shape)
for i, j in delete:
j = j if j - 1 > 0 else 1
i = i if i - 1 > 0 else 1
data[(i - 1) * n_chunk:i * n_chunk, (j - 1) * n_chunk:j * n_chunk] = np.nan
xarr = xr.DataArray(data=data, name="test", dims=["lat", "lon"], coords=dict(lat=lat, lon=lon))
xarr = xarr.chunk((n_chunk, n_chunk))
xarr.to_dataset().to_zarr(r"C:/experiment.zarr", mode="w", encoding={"test": {"_FillValue": None}}) This would write all the chunks (in above case 4 chunks) to the disk (all nans is still a valid float). How can I stop it from writing the all nans chunk? |
Beta Was this translation helpful? Give feedback.
Answered by
rafa-guedes
Jan 18, 2023
Replies: 1 comment
-
There is an encoding option
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
sh-dot-s
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is an encoding option
write_empty_chunks
that can be used for that: