Skip to content

Commit bdf6dba

Browse files
authored
Diagnostics documentation (#540)
1 parent 479f1bf commit bdf6dba

File tree

4 files changed

+200
-14
lines changed

4 files changed

+200
-14
lines changed

docs/images/cubed-add.svg

Lines changed: 132 additions & 0 deletions
Loading

docs/user-guide/diagnostics.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Diagnostics
2+
3+
Cubed provides a variety of tools to understand a computation before running it, to monitor its progress while running, and to view performance statistics after it has completed.
4+
5+
To use these features ensure that the optional dependencies for diagnostics have been installed:
6+
7+
```shell
8+
python -m pip install "cubed[diagnostics]"
9+
```
10+
11+
## Visualize the computation plan
12+
13+
Before running a computation, Cubed will create an internal plan that it uses to compute the output arrays.
14+
15+
The plan is a directed acyclic graph (DAG), and it can be useful to visualize it to see the number of steps involved in your computation, the number of tasks in each step (and overall), and the amount of intermediate data written out.
16+
17+
The {py:meth}`Array.visualize() <cubed.Array.visualize()>` method on an array creates an image of the DAG. By default it is saved in a file called *cubed.svg* in the current working directory, but the filename and format can be changed if needed. If running in a Jupyter notebook the image will be rendered in the notebook.
18+
19+
If you are computing multiple arrays at once, then there is a {py:func}`visualize <cubed.visualize>` function that takes multiple array arguments.
20+
21+
This example shows a tiny computation and the resulting plan:
22+
23+
```python
24+
import cubed.array_api as xp
25+
import cubed.random
26+
27+
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2))
28+
b = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2))
29+
c = xp.add(a, b)
30+
31+
c.visualize()
32+
```
33+
34+
![Cubed visualization of a tiny computation](../images/cubed-add.svg)
35+
36+
There are two type of nodes in the plan. Boxes with rounded corners are operations, while boxes with square corners are arrays.
37+
38+
In this case there are three operations (labelled `op-001`, `op-002`, and `op-003`), which produce the three arrays `a`, `b`, and `c`. (There is always an additional operation called `create-arrays`, shown on the right, which Cubed creates automatically.)
39+
40+
Array `c` is coloured orange, which means it is materialized as a Zarr array. Arrays `a` and `b` do not need to be materialized as Zarr arrays since they are small constant arrays that are passed to the workers running the tasks.
41+
42+
Similarly, the operation that produces `c` is shown in a lilac colour to signify that it runs tasks to produce the output. Operations `op-001` and `op-002` don't run any tasks since `a` and `b` are just small constant arrays.
43+
44+
## Progress bar
45+
46+
You can display a progress bar to track your computation by passing callbacks to {py:meth}`compute() <cubed.Array.compute()>`:
47+
48+
```ipython
49+
>>> from cubed.diagnostics.rich import RichProgressBar
50+
>>> progress = RichProgressBar()
51+
>>> c.compute(callbacks=[progress]) # c is the array from above
52+
create-arrays 1/1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 0:00:00
53+
op-003 add 4/4 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 0:00:00
54+
```
55+
56+
This will work in Jupyter notebooks, and for all executors.
57+
58+
You can also pass callbacks to functions that call `compute`, such as {py:func}`store <cubed.store>` or {py:func}`to_zarr <cubed.to_zarr>`.
59+
60+
## History and timeline visualization
61+
62+
The history and timeline visualization callbacks can be used to find out how long tasks took to run, and how much memory they used.
63+
64+
The timeline visualization is useful to determine how much time was spent in worker startup, as well as how much stragglers affected the overall time of the computation. (Ideally, we want vertical lines on this plot, which would represent perfect horizontal scaling.)
65+
66+
See the [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) for more information about how to use them.

docs/user-guide/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ storage
1111
memory
1212
reliability
1313
scaling
14+
diagnostics
1415
```

docs/user-guide/scaling.md

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -78,20 +78,7 @@ Different cloud providers' serverless offerings may perform differently. For exa
7878

7979
## Diagnosing Performance
8080

81-
To understand how your computation could perform better you first need to diagnose the source of any problems.
82-
83-
### Optimized Plan
84-
85-
Use {py:meth}`Plan.visualize() <cubed.Plan.visualize()>` to view the optimized plan. This allows you to see the number of steps involved in your calculation, the number of tasks in each step, and overall.
86-
87-
### History Callback
88-
89-
The history callback function can help determine how much time was spent in worker startup, as well as how much stragglers affected the overall speed.
90-
91-
### Timeline Visualization Callback
92-
93-
A timeline visualization callback can provide a visual representation of the above points. Ideally, we want vertical lines on this plot, which would represent perfect horizontal scaling.
94-
81+
See <project:diagnostics.md>.
9582

9683
## Tips
9784

0 commit comments

Comments
 (0)