Skip to content

How to apply a custom function on a dataset's chunks to reduce its size and save it on disk ? #7913

Answered by TomNicholas
vdevauxchupin asked this question in Q&A
Discussion options

You must be logged in to vote

I want to apply a computationally intensive method that goes iteratively through every 2D pixel of my dataset (x,y) and works with their entire time dimension (mid_date). I thought the best way of doing that was to first download the dataset, then work on it locally.

It sounds more like you should do this:

  • Open the dataset lazily with a chunking scheme like {'x': 100, 'y': 100. 'time': -1} (so same as chunks on disk expect for opening all time chunks as contiguous).
  • Then you want to write a function that can happily apply your analysis to a single chunk after loading just that chunk into memory. Because you want to apply elementwise (to each pixel) you'll need to look into what np.vect…

Replies: 3 comments 19 replies

Comment options

You must be logged in to vote
1 reply
@e-marshall
Comment options

Comment options

You must be logged in to vote
3 replies
@vdevauxchupin
Comment options

@e-marshall
Comment options

@vdevauxchupin
Comment options

Comment options

You must be logged in to vote
15 replies
@vdevauxchupin
Comment options

@TomNicholas
Comment options

@vdevauxchupin
Comment options

@TomNicholas
Comment options

@vdevauxchupin
Comment options

Answer selected by vdevauxchupin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants