You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataset.to_array() consist in broadcasting all data variables in the dataset against each other, then concatenates them along a new dimension into a new array while preserving coordinates. DataArray.to_dataset() perform the inverse operation.
I was wondering if such two operations involve copying the data or the reshaping is done by referencing in the background to the original values.
When dealing with very large tensors lazily loaded with dask, everytime to_array() and to_dataset() are called, am I effectively passing through all the data? I can not find any documentation on this.
It happens often to me to read a large DataArray with various variables across a "feature" dimension, and apply a function to specific subset of such "feature" dimension. At first thought, I would immediately convert the DataArray to a Dataset and then apply the function. But is this efficient when dealing with large DataArrays? Is smart referencing performed?
On the other hand, defining functions on DataArray subsets is quite ugly (see below).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dataset.to_array()
consist in broadcasting all data variables in the dataset against each other, then concatenates them along a new dimension into a new array while preserving coordinates.DataArray.to_dataset()
perform the inverse operation.I was wondering if such two operations involve copying the data or the reshaping is done by referencing in the background to the original values.
When dealing with very large tensors lazily loaded with dask, everytime
to_array()
andto_dataset()
are called, am I effectively passing through all the data? I can not find any documentation on this.It happens often to me to read a large DataArray with various variables across a "feature" dimension, and apply a function to specific subset of such "feature" dimension. At first thought, I would immediately convert the DataArray to a Dataset and then apply the function. But is this efficient when dealing with large DataArrays? Is smart referencing performed?
On the other hand, defining functions on DataArray subsets is quite ugly (see below).
Thanks in advance for your answers :)
Beta Was this translation helpful? Give feedback.
All reactions