Skip to content

Fix recursive search in Client.get_items #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

- Fix usage documentation of `ItemSearch`
- Fix fields argument to CLI ([#797](https://github.com/stac-utils/pystac-client/pull/797))
- Fix recursive search in `Client.get_items` ([#799](https://github.com/stac-utils/pystac-client/pull/799))

## [v0.8.6] - 2025-02-11

Expand Down
23 changes: 15 additions & 8 deletions pystac_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,27 +443,34 @@ def get_collections(self) -> Iterator[Collection]:
call_modifier(self.modifier, collection)
yield collection

def get_items(
self, *ids: str, recursive: bool | None = None
) -> Iterator["Item_Type"]:
def get_items(self, *ids: str, recursive: bool = True) -> Iterator["Item_Type"]:
"""Return all items of this catalog.

Args:
ids: Zero or more item ids to find.
recursive: unused in pystac-client, but needed for falling back to pystac
recursive : If True, search this catalog and all children for the
item; otherwise, only search the items of this catalog. Defaults
to True.

Return:
Iterator[Item]: Iterator of items whose parent is this
catalog.
"""
if self.conforms_to(ConformanceClasses.ITEM_SEARCH):
search = self.search(ids=ids)
if recursive:
search = self.search(ids=ids)
try:
yield from search.items()
return
except APIError:
child_catalogs = [catalog for catalog, _, _ in self.walk()]
search = self.search(ids=ids, collections=[self, *child_catalogs])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be pretty easy to do accidentally. I think I'd prefer to just let the error raise and make it a little harder to get every single item in planetary computer for instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that without something like this, many functions that call get_items simply don't work for planetary computer or similar APIs that enforce this required argument. This includes:

  • Client.get_all_items,
  • Client.walk,
  • Client.validate_all,
  • Client.describe,
  • Client.make_all_asset_hrefs_relative,
  • Client.make_all_asset_hrefs_absolute

Note that the spec doesn't say one way or another that these arguments must be optional so I'm guessing that planetary computer's API is still spec compliant technically. However, the examples show that a search without collections should be supported so I don't really know one way or the other how to interpret that:

https://github.com/radiantearth/stac-api-spec/blob/604ade6158de15b8ab068320ca41e25e2bf0e116/item-search/examples.md?plain=1#L27

Otherwise the only way to make this work for APIs like planetary computer is to override the Client class like:

import pystac_client

class Client(pystac_client.Client):
    def search(self, *args, **kwargs):
        if kwargs["collections"] is None:
            kwargs["collections"] = [self.id *[catalog.id for catalog, _, _ in self.walk()]]
        return super().search(*args, **kwargs)

pystac_client.client.Client = Client  # so that sub-catalogs also use the updated search method 

If that's the approach we want to go with that's fine, but maybe we should document this workaround in case users want to interact with planetary computer.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to write that all up! I think as long as a clear error surfaces it is fine to have those methods fail on Planetary Computer. Requiring collections is not technically compliant with the spec, so I think it is better to not bake in special handling for this scenario especially since it is likely to result in a surprising user experience (setting collections to include every collection might be very very slow).

else:
search = self.search(ids=ids, collections=[self.id])
yield from search.items()
else:
self._warn_about_fallback("ITEM_SEARCH")
for item in super().get_items(
*ids, recursive=recursive is None or recursive
):
for item in super().get_items(*ids, recursive=recursive):
call_modifier(self.modifier, item)
yield item

Expand Down
232 changes: 232 additions & 0 deletions tests/cassettes/test_client/test_get_items_non_recursion.yaml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

354 changes: 354 additions & 0 deletions tests/cassettes/test_client/test_recursion_on_fallback.yaml

Large diffs are not rendered by default.

43 changes: 41 additions & 2 deletions tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,19 +738,58 @@ def test_collections_are_clients() -> None:


@pytest.mark.vcr
def test_get_items_without_ids() -> None:
def test_get_items_recursion_collections_required_without_ids() -> None:
"""
Make sure recursion using /search works when the server requires collections
when searching
"""
client = Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
)
next(client.get_items())


@pytest.mark.vcr
def test_get_items_recursion_no_collections_without_ids() -> None:
"""
Make sure recursion using /search works when the server does not require collections
when searching
"""
client = Client.open(
"https://paituli.csc.fi/geoserver/ogc/stac/v1/",
)
next(client.get_items())


@pytest.mark.vcr
def test_get_items_non_recursion() -> None:
"""Make sure that non-recursive search is used when using /search"""
client = Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
)
with pytest.raises(StopIteration):
next(client.get_items(recursive=False))


@pytest.mark.vcr
def test_non_recursion_on_fallback() -> None:
"""
Make sure that non-recursive search using fallback only looks for
non-recursive items
"""
path = "https://raw.githubusercontent.com/stac-utils/pystac/v1.9.0/docs/example-catalog/catalog.json"
catalog = Client.from_file(path)
with pytest.warns(FallbackToPystac), pytest.raises(StopIteration):
next(catalog.get_items(recursive=False))


@pytest.mark.vcr
def test_recursion_on_fallback() -> None:
"""Make sure that recursive search using fallback looks for recursive items"""
path = "https://raw.githubusercontent.com/stac-utils/pystac/v1.9.0/docs/example-catalog/catalog.json"
catalog = Client.from_file(path)
with pytest.warns(FallbackToPystac):
[i for i in catalog.get_items()]
next(catalog.get_items())


@pytest.mark.vcr
Expand Down
Loading