Skip to content

Commit e95a1fa

Browse files
committed
add new docs
1 parent e364c48 commit e95a1fa

File tree

10 files changed

+764
-0
lines changed

10 files changed

+764
-0
lines changed

docs/api_reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,4 +71,5 @@ and
7171
```{eval-rst}
7272
.. automodule:: mdio.core.serialization
7373
:members:
74+
:exclude-members: create_rechunk_plan, write_rechunked_values
7475
```

docs/conf.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"sphinx.ext.napoleon",
1818
"sphinx.ext.intersphinx",
1919
"sphinx.ext.autosummary",
20+
"sphinxcontrib.autodoc_pydantic",
2021
"sphinx.ext.autosectionlabel",
2122
"sphinx_click",
2223
"sphinx_copybutton",
@@ -38,6 +39,7 @@
3839
intersphinx_mapping = {
3940
"python": ("https://docs.python.org/3", None),
4041
"numpy": ("https://numpy.org/doc/stable/", None),
42+
"pydantic": ("https://docs.pydantic.dev/latest/", None),
4143
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
4244
}
4345

@@ -50,6 +52,14 @@
5052
autoclass_content = "class"
5153
autosectionlabel_prefix_document = True
5254

55+
autodoc_pydantic_field_list_validators = False
56+
autodoc_pydantic_field_swap_name_and_alias = True
57+
autodoc_pydantic_field_show_alias = False
58+
autodoc_pydantic_model_show_config_summary = False
59+
autodoc_pydantic_model_show_validator_summary = False
60+
autodoc_pydantic_model_show_validator_members = False
61+
autodoc_pydantic_model_show_field_summary = False
62+
5363
html_theme = "furo"
5464

5565
myst_number_code_blocks = ["python"]

docs/data_models/chunk_grids.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
```{eval-rst}
2+
:tocdepth: 3
3+
```
4+
5+
```{currentModule} mdio.schemas.chunk_grid
6+
7+
```
8+
9+
# Chunk Grid Models
10+
11+
```{article-info}
12+
:author: Altay Sansal
13+
:date: "{sub-ref}`today`"
14+
:read-time: "{sub-ref}`wordcount-minutes` min read"
15+
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
16+
```
17+
18+
The variables in MDIO data model can represent different types of chunk grids.
19+
These grids are essential for managing multi-dimensional data arrays efficiently.
20+
In this breakdown, we will explore four distinct data models within the MDIO schema,
21+
each serving a specific purpose in data handling and organization.
22+
23+
MDIO implements data models following the guidelines of the Zarr v3 spec and ZEPs:
24+
25+
- [Zarr core specification (version 3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)
26+
- [ZEP 1 — Zarr specification version 3](https://zarr.dev/zeps/accepted/ZEP0001.html)
27+
- [ZEP 3 — Variable chunking](https://zarr.dev/zeps/draft/ZEP0003.html)
28+
29+
## Regular Grid
30+
31+
The regular grid models are designed to represent a rectangular and regularly
32+
paced chunk grid.
33+
34+
```{eval-rst}
35+
.. autosummary::
36+
RegularChunkGrid
37+
RegularChunkShape
38+
```
39+
40+
For 1D array with `size = 31`{l=python}, we can divide it into 5 equally sized
41+
chunks. Note that the last chunk will be truncated to match the size of the array.
42+
43+
`{ "name": "regular", "configuration": { "chunkShape": [7] } }`{l=json}
44+
45+
Using the above schema resulting array chunks will look like this:
46+
47+
```bash
48+
←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ↔ 3
49+
┌───────┬───────┬───────┬───────┬───┐
50+
└───────┴───────┴───────┴───────┴───┘
51+
```
52+
53+
For 2D array with shape `rows, cols = (7, 17)`{l=python}, we can divide it into 9
54+
equally sized chunks.
55+
56+
`{ "name": "regular", "configuration": { "chunkShape": [3, 7] } }`{l=json}
57+
58+
Using the above schema, the resulting 2D array chunks will look like below.
59+
Note that the rows and columns are conceptual and visually not to scale.
60+
61+
```bash
62+
←─ 7 ─→ ←─ 7 ─→ ↔ 3
63+
┌───────┬───────┬───┐
64+
│ ╎ ╎ │ ↑
65+
│ ╎ ╎ │ 3
66+
│ ╎ ╎ │ ↓
67+
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
68+
│ ╎ ╎ │ ↑
69+
│ ╎ ╎ │ 3
70+
│ ╎ ╎ │ ↓
71+
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
72+
│ ╎ ╎ │ ↕ 1
73+
└───────┴───────┴───┘
74+
```
75+
76+
## Rectilinear Grid
77+
78+
The [RectilinearChunkGrid](RectilinearChunkGrid) model extends
79+
the concept of chunk grids to accommodate rectangular and irregularly spaced chunks.
80+
This model is useful in data structures where non-uniform chunk sizes are necessary.
81+
[RectilinearChunkShape](RectilinearChunkShape) specifies the chunk sizes for each
82+
dimension as a list allowing for irregular intervals.
83+
84+
```{eval-rst}
85+
.. autosummary::
86+
RectilinearChunkGrid
87+
RectilinearChunkShape
88+
```
89+
90+
:::{note}
91+
It's important to ensure that the sum of the irregular spacings specified
92+
in the `chunkShape` matches the size of the respective array dimension.
93+
:::
94+
95+
For 1D array with `size = 39`{l=python}, we can divide it into 5 irregular sized
96+
chunks.
97+
98+
`{ "name": "rectilinear", "configuration": { "chunkShape": [[10, 7, 5, 7, 10]] } }`{l=json}
99+
100+
Using the above schema resulting array chunks will look like this:
101+
102+
```bash
103+
←── 10 ──→ ←─ 7 ─→ ← 5 → ←─ 7 ─→ ←── 10 ──→
104+
┌──────────┬───────┬─────┬───────┬──────────┐
105+
└──────────┴───────┴─────┴───────┴──────────┘
106+
```
107+
108+
For 2D array with shape `rows, cols = (7, 25)`{l=python}, we can divide it into 12
109+
rectilinear (rectangular bur irregular) chunks. Note that the rows and columns are
110+
conceptual and visually not to scale.
111+
112+
`{ "name": "rectilinear", "configuration": { "chunkShape": [[3, 1, 3], [10, 5, 7, 3]] } }`{l=json}
113+
114+
```bash
115+
←── 10 ──→ ← 5 → ←─ 7 ─→ ↔ 3
116+
┌──────────┬─────┬───────┬───┐
117+
│ ╎ ╎ ╎ │ ↑
118+
│ ╎ ╎ ╎ │ 3
119+
│ ╎ ╎ ╎ │ ↓
120+
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
121+
│ ╎ ╎ ╎ │ ↕ 1
122+
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
123+
│ ╎ ╎ ╎ │ ↑
124+
│ ╎ ╎ ╎ │ 3
125+
│ ╎ ╎ ╎ │ ↓
126+
└──────────┴─────┴───────┴───┘
127+
```
128+
129+
## Model Reference
130+
131+
:::{dropdown} RegularChunkGrid
132+
:animate: fade-in-slide-down
133+
134+
```{eval-rst}
135+
.. autopydantic_model:: RegularChunkGrid
136+
137+
----------
138+
139+
.. autopydantic_model:: RegularChunkShape
140+
```
141+
142+
:::
143+
:::{dropdown} RectilinearChunkGrid
144+
:animate: fade-in-slide-down
145+
146+
```{eval-rst}
147+
.. autopydantic_model:: RectilinearChunkGrid
148+
149+
----------
150+
151+
.. autopydantic_model:: RectilinearChunkShape
152+
```
153+
154+
:::

docs/data_models/compressors.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
```{eval-rst}
2+
:tocdepth: 3
3+
```
4+
5+
```{currentModule} mdio.schemas.compressors
6+
7+
```
8+
9+
# Compressors
10+
11+
```{article-info}
12+
:author: Altay Sansal
13+
:date: "{sub-ref}`today`"
14+
:read-time: "{sub-ref}`wordcount-minutes` min read"
15+
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
16+
```
17+
18+
## Dataset Compression
19+
20+
MDIO relies on [numcodecs] for data compression. We provide good defaults based
21+
on opinionated and limited heuristics for each compressor for various energy datasets.
22+
However, using these data models, the compression can be customized.
23+
24+
[Numcodecs] is a project that a convenient interface to different compression
25+
libraries. We selected the [Blosc] and [ZFP] compressors for lossless and lossy
26+
compression of energy data.
27+
28+
## Blosc
29+
30+
A high-performance compressor optimized for binary data, combining fast compression
31+
with a byte-shuffle filter for enhanced efficiency, particularly effective with
32+
numerical arrays in multi-threaded environments.
33+
34+
For more details about compression modes, see [Blosc Documentation].
35+
36+
```{eval-rst}
37+
.. autosummary::
38+
Blosc
39+
```
40+
41+
## ZFP
42+
43+
ZFP is a compression algorithm tailored for floating-point and integer arrays, offering
44+
lossy and lossless compression with customizable precision, well-suited for large
45+
scientific datasets with a focus on balancing data fidelity and compression ratio.
46+
47+
For more details about compression modes, see [ZFP Documentation].
48+
49+
```{eval-rst}
50+
.. autosummary::
51+
ZFP
52+
```
53+
54+
[numcodecs]: https://github.com/zarr-developers/numcodecs
55+
[blosc]: https://github.com/Blosc/c-blosc
56+
[blosc documentation]: https://www.blosc.org/python-blosc/python-blosc.html
57+
[zfp]: https://github.com/LLNL/zfp
58+
[zfp documentation]: https://computing.llnl.gov/projects/zfp
59+
60+
## Model Reference
61+
62+
:::
63+
:::{dropdown} Blosc
64+
:animate: fade-in-slide-down
65+
66+
```{eval-rst}
67+
.. autopydantic_model:: Blosc
68+
69+
----------
70+
71+
.. autoclass:: BloscAlgorithm()
72+
:members:
73+
:undoc-members:
74+
:member-order: bysource
75+
76+
----------
77+
78+
.. autoclass:: BloscShuffle()
79+
:members:
80+
:undoc-members:
81+
:member-order: bysource
82+
```
83+
84+
:::
85+
86+
:::{dropdown} ZFP
87+
:animate: fade-in-slide-down
88+
89+
```{eval-rst}
90+
.. autopydantic_model:: ZFP
91+
92+
----------
93+
94+
.. autoclass:: ZFPMode()
95+
:members:
96+
:undoc-members:
97+
:member-order: bysource
98+
```
99+
100+
:::

0 commit comments

Comments
 (0)