Skip to content

[BUG] download triggers a call to list bucket which is not allowed #1009

@itcarroll

Description

@itcarroll

Is this issue already tracked somewhere, or is this a new report?

  • I've reviewed existing issues and couldn't find a duplicate for this problem.

Current Behavior

Originally reported by @zachghiaccio, in slack thread with @chuckwondo @jhkennedy.

There is (at least) one granule that causes earthaccess.download to produce a PermissionError resulting from s3:ListBucket permission being denied. We should not have needed this permission.

Expected Behavior

The granule should be downloaded without error.

There's no reason the ListBucket should be called. Indeed, if you use the S3FileSystem directly, then you don't get the error. This works fine:

results = earthaccess.search_data(concept_id="G3271604796-NSIDC_CPRD")
fs = earthaccess.get_s3_filesystem(results=results)
fs.get(results[0].data_links(access="direct"), "tmp/")

Steps To Reproduce

results = earthaccess.search_data(concept_id="G3271604796-NSIDC_CPRD")
paths = earthaccess.download(results, "tmp")
---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File [/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py:755](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py#line=754), in S3FileSystem._lsdir(self, path, refresh, max_items, delimiter, prefix, versions)
    754 files = []
--> 755 async for c in self._iterdir(
    756     bucket,
    757     max_items=max_items,
    758     delimiter=delimiter,
    759     prefix=prefix,
    760     versions=versions,
    761 ):
    762     if c["type"] == "directory":

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py:805](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py#line=804), in S3FileSystem._iterdir(self, bucket, max_items, delimiter, prefix, versions)
    798 it = pag.paginate(
    799     Bucket=bucket,
    800     Prefix=prefix,
   (...)
    803     **self.req_kw,
    804 )
--> 805 async for i in it:
    806     for l in i.get("CommonPrefixes", []):

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/aiobotocore/paginate.py:30](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/aiobotocore/paginate.py#line=29), in AioPageIterator.__anext__(self)
     29 while True:
---> 30     response = await self._make_request(current_kwargs)
     31     parsed = self._extract_parsed_response(response)

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/aiobotocore/client.py:412](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/aiobotocore/client.py#line=411), in AioBaseClient._make_api_call(self, operation_name, api_params)
    411     error_class = self.exceptions.from_code(error_code)
--> 412     raise error_class(parsed_response, operation_name)
    413 else:

ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: User: arn:aws:sts::695478930278:assumed-role[/s3-same-region-access-role/itcarroll](https://hub.cryointhecloud.com/s3-same-region-access-role/itcarroll) is not authorized to perform: s3:ListBucket on resource: "arn:aws:s3:::nsidc-cumulus-prod-protected" with an explicit deny in a resource-based policy

The above exception was the direct cause of the following exception:

PermissionError                           Traceback (most recent call last)
Cell In[4], line 2
      1 results = earthaccess.search_data(concept_id="G3271604796-NSIDC_CPRD")
----> 2 paths = earthaccess.download(results)

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/api.py:270](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/api.py#line=269), in download(granules, local_path, provider, threads, pqdm_kwargs)
    267     granules = [granules]
    269 try:
--> 270     return earthaccess.__store__.get(
    271         granules, local_path, provider, threads, pqdm_kwargs=pqdm_kwargs
    272     )
    273 except AttributeError as err:
    274     logger.error(
    275         f"{err}: You must call earthaccess.login() before you can download data"
    276     )

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/store.py:535](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/store.py#line=534), in Store.get(self, granules, local_path, provider, threads, pqdm_kwargs)
    528     local_path = Path.cwd() [/](https://hub.cryointhecloud.com/) "data" [/](https://hub.cryointhecloud.com/) f"{today}-{uuid}"
    530 pqdm_kwargs = {
    531     "n_jobs": threads,
    532     **(pqdm_kwargs or {}),
    533 }
--> 535 return self._get(granules, Path(local_path), provider, pqdm_kwargs=pqdm_kwargs)

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/multimethod/__init__.py:350](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/multimethod/__init__.py#line=349), in multimethod.__call__(self, *args, **kwargs)
    348 func = self.dispatch(*args)
    349 try:
--> 350     return func(*args, **kwargs)
    351 except TypeError as ex:
    352     raise DispatchError(f"Function {func.__code__}") from ex

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/store.py:641](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/earthaccess/store.py#line=640), in Store._get_granules(self, granules, local_path, provider, pqdm_kwargs)
    639 # TODO: make this async
    640 for file in data_links:
--> 641     s3_fs.get(file, str(local_path))
    642     file_name = local_path [/](https://hub.cryointhecloud.com/) Path(file).name
    643     logger.info(f"Downloaded: {file_name}")

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:118](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py#line=117), in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:103](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py#line=102), in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:56](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py#line=55), in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:650](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py#line=649), in AsyncFileSystem._get(self, rpath, lpath, recursive, callback, maxdepth, **kwargs)
    645 rpaths = await self._expand_path(
    646     rpath, recursive=recursive, maxdepth=maxdepth
    647 )
    648 if source_is_str and (not recursive or maxdepth is not None):
    649     # Non-recursive glob does not copy directories
--> 650     rpaths = [
    651         p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))
    652     ]
    653     if not rpaths:
    654         return

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:651](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py#line=650), in <listcomp>(.0)
    645 rpaths = await self._expand_path(
    646     rpath, recursive=recursive, maxdepth=maxdepth
    647 )
    648 if source_is_str and (not recursive or maxdepth is not None):
    649     # Non-recursive glob does not copy directories
    650     rpaths = [
--> 651         p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))
    652     ]
    653     if not rpaths:
    654         return

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py:1556](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py#line=1555), in S3FileSystem._isdir(self, path)
   1554 # This only returns things within the path and NOT the path object itself
   1555 try:
-> 1556     return bool(await self._lsdir(path))
   1557 except FileNotFoundError:
   1558     return False

File [/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py:768](https://hub.cryointhecloud.com/srv/conda/envs/notebook/lib/python3.11/site-packages/s3fs/core.py#line=767), in S3FileSystem._lsdir(self, path, refresh, max_items, delimiter, prefix, versions)
    766     files += dirs
    767 except ClientError as e:
--> 768     raise translate_boto_error(e)
    770 if delimiter and files and not versions:
    771     self.dircache[path] = files

PermissionError: User: arn:aws:sts::695478930278:assumed-role[/s3-same-region-access-role/itcarroll](https://hub.cryointhecloud.com/s3-same-region-access-role/itcarroll) is not authorized to perform: s3:ListBucket on resource: "arn:aws:s3:::nsidc-cumulus-prod-protected" with an explicit deny in a resource-based policy

Environment

  • OS: CryoCloud
  • earthaccess: 0.14.0
  • earthaccess.store.in_region: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions