Skip to content

Handling HTTP errors in search.items() generator #712

@christophfriedrich

Description

@christophfriedrich

I'm searching a STAC catalog and then iterate over the result with the items() generator:

def get_search_result(bbox, start, end):
    catalog = stac.open("https://earth-search.aws.element84.com/v1")
    return catalog.search(
        max_items = None,
        collections = ['sentinel-2-l2a'],
        bbox = bbox,
        datetime = [start+'T00:00:00Z', end+'T00:00:00Z'],
    )
search = get_search_result(bbox, start, end)
for item in search.items():
    # download needed assets
    # process them into product

It's quite a lengthy loop, as each iteration takes about a minute (I don't know if that is relevant).

The other day, about 20 minutes into the loop, my worker crashed with a RemoteDisconnected error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/hsnb/./server-worker.py", line 385, in run_worker
    for item in search.items():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 691, in items
    for item in self.items_as_dicts():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 702, in items_as_dicts
    for page in self.pages_as_dicts():
  File "/usr/local/lib/python3.12/site-packages/pystac_client/item_search.py", line 734, in pages_as_dicts
    for page in self._stac_io.get_pages(
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 307, in get_pages
    page = self.read_json(link, parameters=parameters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac/stac_io.py", line 205, in read_json
    txt = self.read_text(source, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 162, in read_text
    return self.request(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pystac_client/stac_api_io.py", line 218, in request
    raise APIError(str(err))
pystac_client.exceptions.APIError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Apparently something went wrong during the communication with the server. Until today, I didn't even know that each yielding of the next item issues another HTTP request, but of course that makes sense, as all the details of that item have to be fetched.

That one time it failed -- happens.

But how to handle this? Adding a try ... except around the loop would certainly be smart and at least save my worker from a total crash. But it would still throw me out of the loop. I think it would be nice if pystac_client would automatically retry failed requests one or two times?

Something similar seems to have been discussed recently in #680. That discussion ended with "not planned", because the issue was not seen on the pystac_client side. Maybe this example gives a new perspective on the topic?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions