Skip to content

Commit 021c73e

Browse files
hmaarrfkpre-commit-ci[bot]dcherian
authored
Avoid loading entire dataset by getting the nbytes in an array (#7356)
* Avoid instantiating entire dataset by getting the nbytes in an array Using `.data` accidentally tries to load the whole lazy arrays into memory. Sad. * DOC: Add release note for bugfix. * Add test to ensure that number of bytes of sparse array is correctly reported * Add suggested test using InaccessibleArray * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove duplicate test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
1 parent db68db6 commit 021c73e

File tree

3 files changed

+21
-2
lines changed

3 files changed

+21
-2
lines changed

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ Deprecations
3434

3535
Bug fixes
3636
~~~~~~~~~
37+
38+
- Accessing the property ``.nbytes`` of a DataArray, or Variable no longer
39+
accidentally triggers loading the variable into memory.
3740
- Allow numpy-only objects in :py:func:`where` when ``keep_attrs=True`` (:issue:`7362`, :pull:`7364`).
3841
By `Sam Levang <https://github.com/slevang>`_.
3942
- add a ``keep_attrs`` parameter to :py:meth:`Dataset.pad`, :py:meth:`DataArray.pad`,

xarray/core/variable.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -402,8 +402,8 @@ def nbytes(self) -> int:
402402
If the underlying data array does not include ``nbytes``, estimates
403403
the bytes consumed based on the ``size`` and ``dtype``.
404404
"""
405-
if hasattr(self.data, "nbytes"):
406-
return self.data.nbytes
405+
if hasattr(self._data, "nbytes"):
406+
return self._data.nbytes
407407
else:
408408
return self.size * self.dtype.itemsize
409409

xarray/tests/test_dataarray.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
from xarray.core.types import QueryEngineOptions, QueryParserOptions
3131
from xarray.core.utils import is_scalar
3232
from xarray.tests import (
33+
InaccessibleArray,
3334
ReturnItem,
3435
assert_allclose,
3536
assert_array_equal,
@@ -3277,6 +3278,21 @@ def test_from_multiindex_series_sparse(self) -> None:
32773278

32783279
np.testing.assert_equal(actual_coords, expected_coords)
32793280

3281+
def test_nbytes_does_not_load_data(self) -> None:
3282+
array = InaccessibleArray(np.zeros((3, 3), dtype="uint8"))
3283+
da = xr.DataArray(array, dims=["x", "y"])
3284+
3285+
# If xarray tries to instantiate the InaccessibleArray to compute
3286+
# nbytes, the following will raise an error.
3287+
# However, it should still be able to accurately give us information
3288+
# about the number of bytes from the metadata
3289+
assert da.nbytes == 9
3290+
# Here we confirm that this does not depend on array having the
3291+
# nbytes property, since it isn't really required by the array
3292+
# interface. nbytes is more a property of arrays that have been
3293+
# cast to numpy arrays.
3294+
assert not hasattr(array, "nbytes")
3295+
32803296
def test_to_and_from_empty_series(self) -> None:
32813297
# GH697
32823298
expected = pd.Series([], dtype=np.float64)

0 commit comments

Comments
 (0)