Skip to content

Commit 7fb9609

Browse files
authored
Merge pull request #173 from thewtex/large-faq
DOC: Add to_multiscales large dataset FAQ entry
2 parents 326a6c7 + 644f08c commit 7fb9609

File tree

2 files changed

+31
-0
lines changed

2 files changed

+31
-0
lines changed

docs/faq.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Frequently Asked Questions (FAQ)
2+
3+
## Performance and Memory
4+
5+
### Why does `to_multiscales` perform computation immediately with large datasets?
6+
7+
For both small and large datasets, `to_multiscales` returns a simple Python
8+
dataclass composed of basic Python datatypes and lazy dask arrays. The lazy dask
9+
arrays, as with all dask arrays, are a task graph that defines how to generate
10+
those arrays and do not exist in memory. This is the case for both regular sized
11+
datasets and very large datasets.
12+
13+
For very large datasets, though, data conditioning and task graph engineering is
14+
performed during construction to improve performance and avoid running out of
15+
system memory. This preprocessing step ensures that when you eventually compute
16+
the arrays, the operations are optimized and memory-efficient.
17+
18+
If you want to avoid this behavior entirely, you can pass `cache=False` to
19+
`to_multiscales`:
20+
21+
```python
22+
multiscales = to_multiscales(image, cache=False)
23+
```
24+
25+
**Warning:** Disabling caching may cause you to run out of memory when working
26+
with very large datasets!
27+
28+
The lazy evaluation approach allows ngff-zarr to handle extremely large datasets
29+
that wouldn't fit in memory, while still providing optimal performance through
30+
intelligent task graph optimization.

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ cli.md
3737
mcp.md
3838
itk.md
3939
methods.md
40+
faq.md
4041
development.md
4142
```
4243

0 commit comments

Comments
 (0)