Skip to content

Commit 9879944

Browse files
authored
Add Zarr and Xarray examples to docs (#655)
1 parent 848a5e7 commit 9879944

File tree

4 files changed

+147
-0
lines changed

4 files changed

+147
-0
lines changed

docs/examples/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,7 @@ maxdepth: 2
88
---
99
how-to-run
1010
basic-array-ops
11+
zarr
12+
xarray
1113
pangeo
1214
```

docs/examples/xarray.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
file_format: mystnb
3+
kernelspec:
4+
name: python3
5+
---
6+
# Xarray
7+
8+
Cubed can work with Xarray datasets via the [`cubed-xarray`](https://github.com/cubed-dev/cubed-xarray) package.
9+
10+
Install by running the following:
11+
12+
```shell
13+
pip install cubed cubed-xarray xarray pooch netCDF4
14+
```
15+
16+
Note that `pooch` and `netCDF4` are needed to access the Xarray tutorial datasets that we use in the example below.
17+
18+
## Open dataset
19+
20+
Start by importing Xarray - note that we don't need to import Cubed or `cubed-xarray`, since they will be picked up automatically.
21+
22+
```{code-cell} ipython3
23+
import xarray as xr
24+
25+
xr.set_options(display_expand_attrs=False, display_expand_data=True);
26+
```
27+
28+
We open an Xarray dataset (in netCDF format) using the usual `open_dataset` function. By specifying `chunks={}` we ensure that the dataset is chunked using the on-disk chunking (here it is the netCDF file chunking). The `chunked_array_type` argument specifies which chunked array type to use - Cubed in this case.
29+
30+
```{code-cell} ipython3
31+
ds = xr.tutorial.open_dataset(
32+
"air_temperature", chunked_array_type="cubed", chunks={}
33+
)
34+
ds
35+
```
36+
37+
Notice that the `air` data variable is a `cubed.Array`. Since Cubed has a lazy computation model, this array is not loaded from disk until a computation is run.
38+
39+
## Convert to Zarr
40+
41+
We can use Cubed to convert the dataset to Zarr format by calling `to_zarr` on the dataset:
42+
43+
```{code-cell} ipython3
44+
ds.to_zarr("air_temperature_cubed.zarr", mode="w", consolidated=True);
45+
```
46+
47+
This will run a computation that loads the input data and writes it out to a Zarr store on the local filesystem.
48+
49+
## Compute the mean
50+
51+
We can also use Xarray's API to run computations on the dataset using Cubed. Here we find the mean air temperature over time, for each location:
52+
53+
```{code-cell} ipython3
54+
mean = ds.air.mean("time", skipna=False)
55+
mean
56+
```
57+
58+
To run the computation we need to call `compute`:
59+
60+
```{code-cell} ipython3
61+
mean.compute()
62+
```
63+
64+
This is fine for outputs that fit in memory like the example here, but sometimes we want to write the output of the computation to Zarr, which we do by calling `to_zarr` on the dataset instead of `compute`:
65+
66+
```{code-cell} ipython3
67+
mean.to_zarr("mean_air_temperature.zarr", mode="w", consolidated=True);
68+
```
69+
70+
We can check that the Zarr file was created by loading it from disk using `xarray.open_dataset`:
71+
72+
```{code-cell} ipython3
73+
xr.open_dataset(
74+
"mean_air_temperature.zarr", chunked_array_type="cubed", chunks={}
75+
)
76+
```

docs/examples/zarr.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
file_format: mystnb
3+
kernelspec:
4+
name: python3
5+
---
6+
# Zarr
7+
8+
Cubed was designed to work seamlessly with Zarr data. The examples below demonstrate using {py:func}`cubed.from_zarr`, {py:func}`cubed.to_zarr` and {py:func}`cubed.store` to read and write Zarr data.
9+
10+
## Write to Zarr
11+
12+
We'll start by creating a small chunked array containing random data in Cubed and writing it to Zarr using {py:func}`cubed.to_zarr`. Note that the call to `to_zarr` executes eagerly.
13+
14+
```{code-cell} ipython3
15+
import cubed
16+
import cubed.random
17+
18+
# 2MB chunks
19+
a = cubed.random.random((5000, 5000), chunks=(500, 500))
20+
21+
# write to Zarr
22+
cubed.to_zarr(a, "a.zarr")
23+
```
24+
25+
## Read from Zarr
26+
27+
We can check that the Zarr file was created by loading it from disk using {py:func}`cubed.from_zarr`:
28+
29+
```{code-cell} ipython3
30+
cubed.from_zarr("a.zarr")
31+
```
32+
33+
## Multiple arrays
34+
35+
To write multiple arrays in a single computation use {py:func}`cubed.store`:
36+
37+
```{code-cell} ipython3
38+
import cubed
39+
import cubed.random
40+
41+
# 2MB chunks
42+
a = cubed.random.random((5000, 5000), chunks=(500, 500))
43+
b = cubed.random.random((5000, 5000), chunks=(500, 500))
44+
45+
# write to Zarr
46+
arrays = [a, b]
47+
paths = ["a.zarr", "b.zarr"]
48+
cubed.store(arrays, paths)
49+
```
50+
51+
Then to read the Zarr files back, we use {py:func}`cubed.from_zarr` for each array and perform whatever array operations we like on them. Only when we call `to_zarr` is the whole computation executed.
52+
53+
```{code-cell} ipython3
54+
import cubed.array_api as xp
55+
56+
# read from Zarr
57+
a = cubed.from_zarr("a.zarr")
58+
b = cubed.from_zarr("b.zarr")
59+
60+
# perform operation
61+
c = xp.add(a, b)
62+
63+
# write to Zarr
64+
cubed.to_zarr(c, store="c.zarr")
65+
```

docs/requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ tenacity
1515
toolz
1616
tqdm
1717
zarr
18+
cubed-xarray
19+
xarray
20+
pooch
21+
netCDF4
1822

1923
# docs
2024
sphinx-book-theme

0 commit comments

Comments
 (0)