Skip to content

Commit 71372c1

Browse files
noahbensonpre-commit-ci[bot]max-sixty
authored
Docstring and documentation improvement for the Dataset class (#8973)
* Updates the example in the doc-string for the Dataset class to be clearer. The example in the doc-string of the `Dataset` class prior to this commit uses an example array whose size is `2 x 2 x 3` with the first two dimensions labeled `"x"` and `"y"` and the final dimension labeled `"time"`. This was confusing due to the fact that `"x"` and `"y"` are just arbitrary names for these axes and that no reason is given for the data to be organized in a `2x2x3` array instead of a `2x2` matrix. This commit clarifies the example. See issue #8970 for more information. * Updates the documentation of the Dataset class to have clearer examples. These changes to the documentation bring it into alignment with the changes to the `Dataset` doc-string committed previously. See issue #8970 for more information. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adds dataset size reports to the output of the example in the Dataset docstring. * Fixes the documentation errors in the previous commits. * Fixes indentation errors in the docs for previous commits. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
1 parent 08e43b9 commit 71372c1

File tree

2 files changed

+71
-43
lines changed

2 files changed

+71
-43
lines changed

doc/user-guide/data-structures.rst

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -282,27 +282,40 @@ variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).
282282

283283
- ``attrs`` should be a dictionary.
284284

285-
Let's create some fake data for the example we show above:
285+
Let's create some fake data for the example we show above. In this
286+
example dataset, we will represent measurements of the temperature and
287+
pressure that were made under various conditions:
288+
289+
* the measurements were made on four different days;
290+
* they were made at two separate locations, which we will represent using
291+
their latitude and longitude; and
292+
* they were made using instruments by three different manufacutrers, which we
293+
will refer to as `'manufac1'`, `'manufac2'`, and `'manufac3'`.
286294

287295
.. ipython:: python
288296
289-
temp = 15 + 8 * np.random.randn(2, 2, 3)
290-
precip = 10 * np.random.rand(2, 2, 3)
291-
lon = [[-99.83, -99.32], [-99.79, -99.23]]
292-
lat = [[42.25, 42.21], [42.63, 42.59]]
297+
np.random.seed(0)
298+
temperature = 15 + 8 * np.random.randn(2, 3, 4)
299+
precipitation = 10 * np.random.rand(2, 3, 4)
300+
lon = [-99.83, -99.32]
301+
lat = [42.25, 42.21]
302+
instruments = ["manufac1", "manufac2", "manufac3"]
303+
time = pd.date_range("2014-09-06", periods=4)
304+
reference_time = pd.Timestamp("2014-09-05")
293305
294306
# for real use cases, its good practice to supply array attributes such as
295307
# units, but we won't bother here for the sake of brevity
296308
ds = xr.Dataset(
297309
{
298-
"temperature": (["x", "y", "time"], temp),
299-
"precipitation": (["x", "y", "time"], precip),
310+
"temperature": (["loc", "instrument", "time"], temperature),
311+
"precipitation": (["loc", "instrument", "time"], precipitation),
300312
},
301313
coords={
302-
"lon": (["x", "y"], lon),
303-
"lat": (["x", "y"], lat),
304-
"time": pd.date_range("2014-09-06", periods=3),
305-
"reference_time": pd.Timestamp("2014-09-05"),
314+
"lon": (["loc"], lon),
315+
"lat": (["loc"], lat),
316+
"instrument": instruments,
317+
"time": time,
318+
"reference_time": reference_time,
306319
},
307320
)
308321
ds
@@ -387,12 +400,12 @@ example, to create this example dataset from scratch, we could have written:
387400
.. ipython:: python
388401
389402
ds = xr.Dataset()
390-
ds["temperature"] = (("x", "y", "time"), temp)
391-
ds["temperature_double"] = (("x", "y", "time"), temp * 2)
392-
ds["precipitation"] = (("x", "y", "time"), precip)
393-
ds.coords["lat"] = (("x", "y"), lat)
394-
ds.coords["lon"] = (("x", "y"), lon)
395-
ds.coords["time"] = pd.date_range("2014-09-06", periods=3)
403+
ds["temperature"] = (("loc", "instrument", "time"), temperature)
404+
ds["temperature_double"] = (("loc", "instrument", "time"), temperature * 2)
405+
ds["precipitation"] = (("loc", "instrument", "time"), precipitation)
406+
ds.coords["lat"] = (("loc",), lat)
407+
ds.coords["lon"] = (("loc",), lon)
408+
ds.coords["time"] = pd.date_range("2014-09-06", periods=4)
396409
ds.coords["reference_time"] = pd.Timestamp("2014-09-05")
397410
398411
To change the variables in a ``Dataset``, you can use all the standard dictionary
@@ -452,8 +465,8 @@ follow nested function calls:
452465
453466
# these lines are equivalent, but with pipe we can make the logic flow
454467
# entirely from left to right
455-
plt.plot((2 * ds.temperature.sel(x=0)).mean("y"))
456-
(ds.temperature.sel(x=0).pipe(lambda x: 2 * x).mean("y").pipe(plt.plot))
468+
plt.plot((2 * ds.temperature.sel(loc=0)).mean("instrument"))
469+
(ds.temperature.sel(loc=0).pipe(lambda x: 2 * x).mean("instrument").pipe(plt.plot))
457470
458471
Both ``pipe`` and ``assign`` replicate the pandas methods of the same names
459472
(:py:meth:`DataFrame.pipe <pandas.DataFrame.pipe>` and
@@ -479,7 +492,7 @@ dimension and non-dimension variables:
479492

480493
.. ipython:: python
481494
482-
ds.coords["day"] = ("time", [6, 7, 8])
495+
ds.coords["day"] = ("time", [6, 7, 8, 9])
483496
ds.swap_dims({"time": "day"})
484497
485498
.. _coordinates:

xarray/core/dataset.py

Lines changed: 38 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -590,60 +590,75 @@ class Dataset(
590590
591591
Examples
592592
--------
593-
Create data:
593+
In this example dataset, we will represent measurements of the temperature
594+
and pressure that were made under various conditions:
595+
596+
* the measurements were made on four different days;
597+
* they were made at two separate locations, which we will represent using
598+
their latitude and longitude; and
599+
* they were made using three instrument developed by three different
600+
manufacturers, which we will refer to using the strings `'manufac1'`,
601+
`'manufac2'`, and `'manufac3'`.
594602
595603
>>> np.random.seed(0)
596-
>>> temperature = 15 + 8 * np.random.randn(2, 2, 3)
597-
>>> precipitation = 10 * np.random.rand(2, 2, 3)
598-
>>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
599-
>>> lat = [[42.25, 42.21], [42.63, 42.59]]
600-
>>> time = pd.date_range("2014-09-06", periods=3)
604+
>>> temperature = 15 + 8 * np.random.randn(2, 3, 4)
605+
>>> precipitation = 10 * np.random.rand(2, 3, 4)
606+
>>> lon = [-99.83, -99.32]
607+
>>> lat = [42.25, 42.21]
608+
>>> instruments = ["manufac1", "manufac2", "manufac3"]
609+
>>> time = pd.date_range("2014-09-06", periods=4)
601610
>>> reference_time = pd.Timestamp("2014-09-05")
602611
603-
Initialize a dataset with multiple dimensions:
612+
Here, we initialize the dataset with multiple dimensions. We use the string
613+
`"loc"` to represent the location dimension of the data, the string
614+
`"instrument"` to represent the instrument manufacturer dimension, and the
615+
string `"time"` for the time dimension.
604616
605617
>>> ds = xr.Dataset(
606618
... data_vars=dict(
607-
... temperature=(["x", "y", "time"], temperature),
608-
... precipitation=(["x", "y", "time"], precipitation),
619+
... temperature=(["loc", "instrument", "time"], temperature),
620+
... precipitation=(["loc", "instrument", "time"], precipitation),
609621
... ),
610622
... coords=dict(
611-
... lon=(["x", "y"], lon),
612-
... lat=(["x", "y"], lat),
623+
... lon=("loc", lon),
624+
... lat=("loc", lat),
625+
... instrument=instruments,
613626
... time=time,
614627
... reference_time=reference_time,
615628
... ),
616629
... attrs=dict(description="Weather related data."),
617630
... )
618631
>>> ds
619-
<xarray.Dataset> Size: 288B
620-
Dimensions: (x: 2, y: 2, time: 3)
632+
<xarray.Dataset> Size: 552B
633+
Dimensions: (loc: 2, instrument: 3, time: 4)
621634
Coordinates:
622-
lon (x, y) float64 32B -99.83 -99.32 -99.79 -99.23
623-
lat (x, y) float64 32B 42.25 42.21 42.63 42.59
624-
* time (time) datetime64[ns] 24B 2014-09-06 2014-09-07 2014-09-08
635+
lon (loc) float64 16B -99.83 -99.32
636+
lat (loc) float64 16B 42.25 42.21
637+
* instrument (instrument) <U8 96B 'manufac1' 'manufac2' 'manufac3'
638+
* time (time) datetime64[ns] 32B 2014-09-06 ... 2014-09-09
625639
reference_time datetime64[ns] 8B 2014-09-05
626-
Dimensions without coordinates: x, y
640+
Dimensions without coordinates: loc
627641
Data variables:
628-
temperature (x, y, time) float64 96B 29.11 18.2 22.83 ... 16.15 26.63
629-
precipitation (x, y, time) float64 96B 5.68 9.256 0.7104 ... 4.615 7.805
642+
temperature (loc, instrument, time) float64 192B 29.11 18.2 ... 9.063
643+
precipitation (loc, instrument, time) float64 192B 4.562 5.684 ... 1.613
630644
Attributes:
631645
description: Weather related data.
632646
633647
Find out where the coldest temperature was and what values the
634648
other variables had:
635649
636650
>>> ds.isel(ds.temperature.argmin(...))
637-
<xarray.Dataset> Size: 48B
651+
<xarray.Dataset> Size: 80B
638652
Dimensions: ()
639653
Coordinates:
640654
lon float64 8B -99.32
641655
lat float64 8B 42.21
642-
time datetime64[ns] 8B 2014-09-08
656+
instrument <U8 32B 'manufac3'
657+
time datetime64[ns] 8B 2014-09-06
643658
reference_time datetime64[ns] 8B 2014-09-05
644659
Data variables:
645-
temperature float64 8B 7.182
646-
precipitation float64 8B 8.326
660+
temperature float64 8B -5.424
661+
precipitation float64 8B 9.884
647662
Attributes:
648663
description: Weather related data.
649664

0 commit comments

Comments
 (0)