Skip to content

Commit caed274

Browse files
authored
"source" encoding for datasets opened from fsspec objects (#8923)
* draft for setting `source` from pre-opened `fsspec` file objects * refactor to only import `fsspec` if we're actually going to check Could use `getattr(filename_or_obj, "path", filename_or_obj)` to avoid `isinstance` checks. * replace with a simple `getattr` on `"path"` * add a test * whats-new entry * open the file as a context manager
1 parent 42ed6d3 commit caed274

File tree

3 files changed

+22
-2
lines changed

3 files changed

+22
-2
lines changed

doc/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ New Features
2424
~~~~~~~~~~~~
2525
- Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
2626
By `Martin Raspaud <https://github.com/mraspaud>`_.
27+
- Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`).
28+
By `Justus Magin <https://github.com/keewis>`_.
2729

2830
Breaking changes
2931
~~~~~~~~~~~~~~~~

xarray/backends/api.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -382,8 +382,11 @@ def _dataset_from_backend_dataset(
382382
ds.set_close(backend_ds._close)
383383

384384
# Ensure source filename always stored in dataset object
385-
if "source" not in ds.encoding and isinstance(filename_or_obj, (str, os.PathLike)):
386-
ds.encoding["source"] = _normalize_path(filename_or_obj)
385+
if "source" not in ds.encoding:
386+
path = getattr(filename_or_obj, "path", filename_or_obj)
387+
388+
if isinstance(path, (str, os.PathLike)):
389+
ds.encoding["source"] = _normalize_path(path)
387390

388391
return ds
389392

xarray/tests/test_backends.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5151,6 +5151,21 @@ def test_source_encoding_always_present_with_pathlib() -> None:
51515151
assert ds.encoding["source"] == tmp
51525152

51535153

5154+
@requires_h5netcdf
5155+
@requires_fsspec
5156+
def test_source_encoding_always_present_with_fsspec() -> None:
5157+
import fsspec
5158+
5159+
rnddata = np.random.randn(10)
5160+
original = Dataset({"foo": ("x", rnddata)})
5161+
with create_tmp_file() as tmp:
5162+
original.to_netcdf(tmp)
5163+
5164+
fs = fsspec.filesystem("file")
5165+
with fs.open(tmp) as f, open_dataset(f) as ds:
5166+
assert ds.encoding["source"] == tmp
5167+
5168+
51545169
def _assert_no_dates_out_of_range_warning(record):
51555170
undesired_message = "dates out of range"
51565171
for warning in record:

0 commit comments

Comments
 (0)