-
Notifications
You must be signed in to change notification settings - Fork 54
Fix recursive search in Client.get_items #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
6456d5a
92ba2c1
00aff72
9f9c5f8
a7fce5b
7c9e855
967eb79
a0ebc89
8a268d2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -443,27 +443,34 @@ def get_collections(self) -> Iterator[Collection]: | |
call_modifier(self.modifier, collection) | ||
yield collection | ||
|
||
def get_items( | ||
self, *ids: str, recursive: bool | None = None | ||
) -> Iterator["Item_Type"]: | ||
def get_items(self, *ids: str, recursive: bool = True) -> Iterator["Item_Type"]: | ||
"""Return all items of this catalog. | ||
|
||
Args: | ||
ids: Zero or more item ids to find. | ||
recursive: unused in pystac-client, but needed for falling back to pystac | ||
recursive : If True, search this catalog and all children for the | ||
item; otherwise, only search the items of this catalog. Defaults | ||
to True. | ||
|
||
Return: | ||
Iterator[Item]: Iterator of items whose parent is this | ||
catalog. | ||
""" | ||
if self.conforms_to(ConformanceClasses.ITEM_SEARCH): | ||
search = self.search(ids=ids) | ||
if recursive: | ||
search = self.search(ids=ids) | ||
try: | ||
yield from search.items() | ||
return | ||
except APIError: | ||
child_catalogs = [catalog for catalog, _, _ in self.walk()] | ||
search = self.search(ids=ids, collections=[self, *child_catalogs]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like it would be pretty easy to do accidentally. I think I'd prefer to just let the error raise and make it a little harder to get every single item in planetary computer for instance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My concern is that without something like this, many functions that call
Note that the spec doesn't say one way or another that these arguments must be optional so I'm guessing that planetary computer's API is still spec compliant technically. However, the examples show that a search without collections should be supported so I don't really know one way or the other how to interpret that: Otherwise the only way to make this work for APIs like planetary computer is to override the Client class like: import pystac_client
class Client(pystac_client.Client):
def search(self, *args, **kwargs):
if kwargs["collections"] is None:
kwargs["collections"] = [self.id *[catalog.id for catalog, _, _ in self.walk()]]
return super().search(*args, **kwargs)
pystac_client.client.Client = Client # so that sub-catalogs also use the updated search method If that's the approach we want to go with that's fine, but maybe we should document this workaround in case users want to interact with planetary computer. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for taking the time to write that all up! I think as long as a clear error surfaces it is fine to have those methods fail on Planetary Computer. Requiring collections is not technically compliant with the spec, so I think it is better to not bake in special handling for this scenario especially since it is likely to result in a surprising user experience (setting collections to include every collection might be very very slow). |
||
else: | ||
search = self.search(ids=ids, collections=[self.id]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't feel quite right, since the client is a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right that the naming is not ideal but that's the parameter name that the API provides. Items can be direct children of Catalogs and the API spec does not provide a separate The other option is to skip the option to use the search endpoint for all non-recursive calls and do something like: if self.conforms_to(ConformanceClasses.ITEM_SEARCH) and recursive:
yield from self.search(ids=ids).items()
else:
if not self.conforms_to(ConformanceClasses.ITEM_SEARCH):
self._warn_about_fallback("ITEM_SEARCH")
for item in super().get_items(
*ids, recursive=recursive is None or recursive
):
call_modifier(self.modifier, item)
yield item There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, we try to expand pystac-client with heuristics to help it work with real-world instances (rather than being strictly spec-enforcing) but this use-case is unusual enough that I'm not sure it's worth the complexity to manage. I'm still not sure the problem we're trying to solve here is pystac-client's problem. As the original docstring said, we're not using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the confusing thing for me as a user of pystac-client is that the recursive behaviour is inconsistent depending on whether its using the Currently the behaviour is:
If the solution is to just say don't use pystac-client in this case then let's at least document this better. Maybe change this
to
Or something similar that talks about the distinction. On a personal note... I don't think I'll be able to use pystac-client in my applications if we go this route. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏼 to the docs update. pystac-client is for STAC APIs, not static STAC catalogs, and our fallback to pystac is more of a convenience than a core feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that's fine. Just so you know, the documentation talks about pystac in a way that makes it seem like pystac is more than just a convenience so you might understand why people might assume that pystac-client would align more closely with pystac than it does:
In that last link you even have the line (in the consequences heading): "Special care should be taken to ensure that we do not break any of PySTAC’s functionality through inheritance." Which is exactly the issue that this PR is trying to address There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, I appreciate the call-out. There's been discussions over the years on whether we should even have the two libraries be separate (for one example, stac-utils/pystac#1334 (comment)). Any documentation cleanup/fixes to make things clearer for folks would be appreciated 🙇🏼. FWIW My current thinking is that if we ever wanted to go to a v1.0 release of pystac-client, we'd want to drop inheritance altogether to avoid these problems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries, that makes sense. I understand now why pystac-client is taking this approach. I've created a separate PR #800 that just updates the docstring as we discussed. |
||
yield from search.items() | ||
else: | ||
self._warn_about_fallback("ITEM_SEARCH") | ||
for item in super().get_items( | ||
*ids, recursive=recursive is None or recursive | ||
): | ||
for item in super().get_items(*ids, recursive=recursive): | ||
mishaschwartz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
call_modifier(self.modifier, item) | ||
yield item | ||
|
||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Uh oh!
There was an error while loading. Please reload this page.