-
|
Hi there, I was directed to Icechunk from a thread on replacing LRUStoreCache from Zarr 2. zarr-developers/zarr-python#2857 I'm looking at the CachingConfig which is the main reference to caching I see in icechunk and I'm not sure I understand it-it seems to be totally for metadata rather than actual data. Is there another way/does Icechunk handle memory caching of data chunks and can you configure it? My goal is pretty simple, just to keep some number of recently loaded chunks in memory to decrease I/O for cases when I am accessing data from the same chunks repeatedly in an xarray backend. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Chunks are also cached (i.e. Here's an example of caching in action: (Edited to fix typo) import icechunk as ic
import zar
storage = ic.s3_storage(
bucket="earthmover-sample-data",
prefix="icechunk/gfs/solar/2024-05-13T00:00:00-pcodec",
anonymous=True,
region="us-east-1"
)
config = ic.RepositoryConfig.default()
# set to zero to turn off caching
config.caching = ic.CachingConfig(num_bytes_chunks=int(1e9))
repo = ic.Repository.open(storage, config)
session = repo.readonly_session("main")
array = zarr.open_array(session.store, path="t2m")
%time array[0, 0, 0];
# -> 114 ms
%time array[0, 0, 0]
# -> 5ms |
Beta Was this translation helpful? Give feedback.
Chunks are also cached (i.e.
num_bytes_chunksparameter).Here's an example of caching in action:
(Edited to fix typo)