Writing a netCDF file is slow #6921
Replies: 4 comments 1 reply
-
@lassiterdc, writing large, chunked xarray dataset to a netCDF file is always a challenge and quite slow since the write is serial. However, you could take advantage of the |
Beta Was this translation helpful? Give feedback.
-
Thanks, @andersy005. I think that |
Beta Was this translation helpful? Give feedback.
-
Great... keep us posted once you have a working solution. I'm going to convert this issue in a discussion instead. |
Beta Was this translation helpful? Give feedback.
-
Update on expediting Data - same as original post ds_comb_frm_open_mfdataset = xr.open_mfdataset(files, chunks={"latitude":3500, "longitude":7000},
concat_dim = "time", preprocess=ds_preprocessing, combine="nested")
dates, lst_ds = zip(*ds_comb_frm_open_mfdataset.groupby("time.hour"))
start_time = time.time()
ds = lst_ds[0]
ds.to_netcdf("ds_1hr_no_loading.nc", mode="w", encoding= {"rainrate":{"zlib":True}})
print("Time to export 1 hour worth of data without first loading it into memory: {}".format(time.time() - start_time))
start_time = time.time()
i = -1
for ds in lst_ds:
i += 1
if i == 0:
ds.load().to_netcdf("ds_by_appending.nc", mode="w", encoding= {"rainrate":{"zlib":True}})
print("Time to export 1 hour worth of data after first loading it into memory: {}".format(time.time() - start_time))
ds.load().to_netcdf("ds_by_appending.nc", mode="a", encoding= {"rainrate":{"zlib":True}})
print("Time to export dataset created by appending netcdf: {}".format(time.time() - start_time))
MemoryError: Unable to allocate 2.74 GiB for an array with shape (30, 3500, 7000) and data type float32 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What is your issue?
This has been discussed in another thread, but the proposed solution there (first
.load()
the dataset into memory before runningto_netcdf
) does not work for me since my dataset is too large to fit into memory. The following code takes around 8 hours to run. You'll notice that I tried bothxr.open_mfdataset
andxr.concat
in case it would make a difference, but it doesn't. I also tried profiling the code according to this example. The results are in this html (dropbox link) but I'm not really sure what I'm looking at.Data: dropbox link to 717 netcdf files containing radar rainfall data for 6/28/2014 over the United States that is around 1GB in total.
Code:
Beta Was this translation helpful? Give feedback.
All reactions