Skip to content

Commit d866853

Browse files
authored
docs: async iteration notebook (#155)
1 parent 1d1991e commit d866853

File tree

4 files changed

+319
-0
lines changed

4 files changed

+319
-0
lines changed

docs/notebooks/async-search.ipynb

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "d2efdd4c",
6+
"metadata": {},
7+
"source": [
8+
"# Async iteration\n",
9+
"\n",
10+
"In [rustac v0.8.1](https://github.com/stac-utils/rustac-py/releases/tag/v0.8.1) we added the ability to iterate a search asynchronously.\n",
11+
"Let's compare this new capability with the synchronous version via [pystac-client](https://github.com/stac-utils/pystac-client).\n",
12+
"\n",
13+
"The `copernicus-dem` collection at https://stac.eoapi.dev has 26450 items, which makes it a good single collection test case for iterating over a bunch of things."
14+
]
15+
},
16+
{
17+
"cell_type": "code",
18+
"execution_count": 1,
19+
"id": "eaf03f73",
20+
"metadata": {},
21+
"outputs": [
22+
{
23+
"name": "stderr",
24+
"output_type": "stream",
25+
"text": [
26+
"building \"rustac\"\n",
27+
"rebuilt and loaded package \"rustac\" in 4.783s\n"
28+
]
29+
}
30+
],
31+
"source": [
32+
"import time\n",
33+
"from tqdm.notebook import tqdm\n",
34+
"\n",
35+
"import rustac\n",
36+
"from pystac_client import Client\n",
37+
"\n",
38+
"url = \"https://stac.eoapi.dev\"\n",
39+
"collection = \"copernicus-dem\"\n",
40+
"total = 26450"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"id": "3c4ecaf4",
46+
"metadata": {},
47+
"source": [
48+
"First, let's try **pystac-client**.\n",
49+
"In our testing, it takes almost six minutes to iterate over everything, so we're going to limit things to the first one thousand items."
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": 2,
55+
"id": "fba8e0ca",
56+
"metadata": {},
57+
"outputs": [
58+
{
59+
"data": {
60+
"application/vnd.jupyter.widget-view+json": {
61+
"model_id": "db2cb3a4b8894a88a50358b8422c4f1a",
62+
"version_major": 2,
63+
"version_minor": 0
64+
},
65+
"text/plain": [
66+
" 0%| | 0/1000 [00:00<?, ?it/s]"
67+
]
68+
},
69+
"metadata": {},
70+
"output_type": "display_data"
71+
},
72+
{
73+
"name": "stdout",
74+
"output_type": "stream",
75+
"text": [
76+
"Got 1000 items in 14.28 seconds\n"
77+
]
78+
}
79+
],
80+
"source": [
81+
"client = Client.open(url)\n",
82+
"items = []\n",
83+
"progress = tqdm(total=1000)\n",
84+
"\n",
85+
"start = time.time()\n",
86+
"item_search = client.search(collections=[collection])\n",
87+
"for item in item_search.items():\n",
88+
" items.append(item)\n",
89+
" progress.update()\n",
90+
" if len(items) >= 1000:\n",
91+
" break\n",
92+
"print(f\"Got {len(items)} items in {time.time() - start:.2f} seconds\")\n",
93+
"progress.close()"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"id": "e63b830f",
99+
"metadata": {},
100+
"source": [
101+
"**rustac** does some asynchronous page pre-fetching under the hood, so it might be faster?\n",
102+
"Let's find out."
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 3,
108+
"id": "211b184a",
109+
"metadata": {},
110+
"outputs": [
111+
{
112+
"data": {
113+
"application/vnd.jupyter.widget-view+json": {
114+
"model_id": "d3bdd8312b004cd3a2b537e429be1e5e",
115+
"version_major": 2,
116+
"version_minor": 0
117+
},
118+
"text/plain": [
119+
" 0%| | 0/1000 [00:00<?, ?it/s]"
120+
]
121+
},
122+
"metadata": {},
123+
"output_type": "display_data"
124+
},
125+
{
126+
"name": "stdout",
127+
"output_type": "stream",
128+
"text": [
129+
"Got 1000 items in 13.67 seconds\n"
130+
]
131+
}
132+
],
133+
"source": [
134+
"progress = tqdm(total=1000)\n",
135+
"items = []\n",
136+
"\n",
137+
"start = time.time()\n",
138+
"search = await rustac.iter_search(url, collections=[collection])\n",
139+
"async for item in search:\n",
140+
" items.append(item)\n",
141+
" progress.update()\n",
142+
" if len(items) >= 1000:\n",
143+
" break\n",
144+
"print(f\"Got {len(items)} items in {time.time() - start:.2f} seconds\")\n",
145+
"progress.close()"
146+
]
147+
},
148+
{
149+
"cell_type": "markdown",
150+
"id": "a0f4fae8",
151+
"metadata": {},
152+
"source": [
153+
"Okay, that's about the same, which suggests we're mostly being limited by server response time.\n",
154+
"If we increase the page size, does that make our async iteration faster?"
155+
]
156+
},
157+
{
158+
"cell_type": "code",
159+
"execution_count": 4,
160+
"id": "8ca810fc",
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"data": {
165+
"application/vnd.jupyter.widget-view+json": {
166+
"model_id": "2d571afb0b684671b1e3316fbc9716db",
167+
"version_major": 2,
168+
"version_minor": 0
169+
},
170+
"text/plain": [
171+
" 0%| | 0/5000 [00:00<?, ?it/s]"
172+
]
173+
},
174+
"metadata": {},
175+
"output_type": "display_data"
176+
},
177+
{
178+
"name": "stdout",
179+
"output_type": "stream",
180+
"text": [
181+
"Got 5000 items in 11.09 seconds\n"
182+
]
183+
}
184+
],
185+
"source": [
186+
"client = Client.open(url)\n",
187+
"items = []\n",
188+
"progress = tqdm(total=5000)\n",
189+
"\n",
190+
"start = time.time()\n",
191+
"item_search = client.search(collections=[collection], limit=500)\n",
192+
"for item in item_search.items():\n",
193+
" items.append(item)\n",
194+
" progress.update()\n",
195+
" if len(items) >= 5000:\n",
196+
" break\n",
197+
"print(f\"Got {len(items)} items in {time.time() - start:.2f} seconds\")\n",
198+
"progress.close()"
199+
]
200+
},
201+
{
202+
"cell_type": "code",
203+
"execution_count": 5,
204+
"id": "e6a00733",
205+
"metadata": {},
206+
"outputs": [
207+
{
208+
"data": {
209+
"application/vnd.jupyter.widget-view+json": {
210+
"model_id": "5c130f3626b64524a62a6daccde79694",
211+
"version_major": 2,
212+
"version_minor": 0
213+
},
214+
"text/plain": [
215+
" 0%| | 0/5000 [00:00<?, ?it/s]"
216+
]
217+
},
218+
"metadata": {},
219+
"output_type": "display_data"
220+
},
221+
{
222+
"name": "stdout",
223+
"output_type": "stream",
224+
"text": [
225+
"Got 5000 items in 10.77 seconds\n"
226+
]
227+
}
228+
],
229+
"source": [
230+
"progress = tqdm(total=5000)\n",
231+
"items = []\n",
232+
"\n",
233+
"start = time.time()\n",
234+
"search = await rustac.iter_search(url, collections=[collection], limit=500)\n",
235+
"async for item in search:\n",
236+
" items.append(item)\n",
237+
" progress.update()\n",
238+
" if len(items) >= 5000:\n",
239+
" break\n",
240+
"print(f\"Got {len(items)} items in {time.time() - start:.2f} seconds\")\n",
241+
"progress.close()"
242+
]
243+
}
244+
],
245+
"metadata": {
246+
"kernelspec": {
247+
"display_name": "rustac-py",
248+
"language": "python",
249+
"name": "python3"
250+
},
251+
"language_info": {
252+
"codemirror_mode": {
253+
"name": "ipython",
254+
"version": 3
255+
},
256+
"file_extension": ".py",
257+
"mimetype": "text/x-python",
258+
"name": "python",
259+
"nbconvert_exporter": "python",
260+
"pygments_lexer": "ipython3",
261+
"version": "3.13.2"
262+
}
263+
},
264+
"nbformat": 4,
265+
"nbformat_minor": 5
266+
}

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ nav:
2525
- notebooks/stac-geoparquet.ipynb
2626
- notebooks/its-live.ipynb
2727
- notebooks/search.ipynb
28+
- notebooks/async-search.ipynb
2829
- API:
2930
- api/index.md
3031
- arrow: api/arrow.md

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,13 +76,15 @@ docs = [
7676
"griffe>=1.6.0",
7777
"humanize>=4.12.1",
7878
"ipykernel>=6.29.5",
79+
"ipywidgets>=8.1.7",
7980
"jinja2>=3.1.4",
8081
"mike>=2.1.3",
8182
"mkdocs-jupyter>=0.25.1",
8283
"mkdocs-material[imaging]>=9.5.45",
8384
"mkdocstrings[python]>=0.27.0",
8485
"obstore>=0.6.0",
8586
"pystac-client>=0.8.5",
87+
"tqdm>=4.67.1",
8688
]
8789

8890
[tool.uv]

uv.lock

Lines changed: 50 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)