Skip to content

Commit 3d29abc

Browse files
committed
Update I/O docs
1 parent 8be1813 commit 3d29abc

File tree

3 files changed

+195
-74
lines changed

3 files changed

+195
-74
lines changed

docs/api.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,20 @@ StacIO
171171
:members:
172172
:undoc-members:
173173

174+
DefaultStacIO
175+
~~~~~~~~~~~~~
176+
177+
.. autoclass:: pystac.stac_io.DefaultStacIO
178+
:members:
179+
:show-inheritance:
180+
181+
DuplicateKeyReportingMixin
182+
~~~~~~~~~~~~~~~~~~~~~~~~~~
183+
184+
.. autoclass:: pystac.stac_io.DuplicateKeyReportingMixin
185+
:members:
186+
:show-inheritance:
187+
174188
STAC_IO
175189
~~~~~~~
176190

@@ -213,11 +227,47 @@ STACError
213227

214228
.. autoclass:: pystac.STACError
215229

230+
STACTypeError
231+
~~~~~~~~~~~~~
232+
233+
.. autoclass:: pystac.STACTypeError
234+
235+
DuplicateObjectKeyError
236+
~~~~~~~~~~~~~~~~~~~~~~~
237+
238+
.. autoclass:: pystac.DuplicateObjectKeyError
239+
240+
ExtensionAlreadyExistsError
241+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
242+
243+
.. autoclass:: pystac.ExtensionAlreadyExistsError
244+
245+
ExtensionTypeError
246+
~~~~~~~~~~~~~~~~~~
247+
248+
.. autoclass:: pystac.ExtensionTypeError
249+
250+
ExtensionNotImplemented
251+
~~~~~~~~~~~~~~~~~~~~~~~
252+
253+
.. autoclass:: pystac.ExtensionNotImplemented
254+
216255
ExtensionTypeError
217256
~~~~~~~~~~~~~~~~~~
218257

219258
.. autoclass:: pystac.ExtensionTypeError
220259

260+
RequiredPropertyMissing
261+
~~~~~~~~~~~~~~~~~~~~~~~
262+
263+
.. autoclass:: pystac.RequiredPropertyMissing
264+
265+
STACValidationError
266+
~~~~~~~~~~~~~~~~~~~
267+
268+
.. autoclass:: pystac.STACValidationError
269+
270+
221271
Extensions
222272
----------
223273

docs/concepts.rst

Lines changed: 77 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -225,72 +225,96 @@ written (e.g. if you are working with self-contained catalogs).
225225

226226
.. _using stac_io:
227227

228-
Using STAC_IO
228+
I/O in PySTAC
229229
=============
230230

231-
The :class:`~pystac.STAC_IO` class is the way PySTAC reads and writes text from file
232-
locations. Since PySTAC aims to be dependency-free, there is no default mechanisms to
233-
read and write from anything but the local file system. However, users of PySTAC may
234-
want to read and write from other file systems, such as HTTP or cloud object storage.
235-
STAC_IO allows users to hook into PySTAC and define their own reading and writing
236-
primitives to allow for those use cases.
237-
238-
To enable reading from other types of file systems, it is recommended that in the
239-
`__init__.py` of the client module, or at the beginning of the script using PySTAC, you
240-
overwrite the :func:`STAC_IO.read_text_method <pystac.STAC_IO.read_text_method>` and
241-
:func:`STAC_IO.write_text_method <pystac.STAC_IO.write_text_method>` members of STAC_IO
242-
with functions that read and write however you need. For example, this code will allow
231+
The :class:`pystac.StacIO` class defines fundamental methods for I/O
232+
operations within PySTAC, including serialization and deserialization to and from
233+
JSON files and conversion to and from Python dictionaries. This is an abstract class
234+
and should not be instantiated directly. However, PySTAC provides a
235+
:class:`pystac.stac_io.DefaultStacIO` class with minimal implementations of these
236+
methods. This default implementation provides support for reading and writing files
237+
from the local filesystem as well as HTTP URIs (using ``urllib``). This class is
238+
created automatically by all of the object-specific I/O methods (e.g.
239+
:meth:`pystac.Catalog.from_file`), so most users will not need to instantiate this
240+
class themselves.
241+
242+
If you require custom logic for I/O operations or would like to use a 3rd-party library
243+
for I/O operations (e.g. ``requests``), you can create a sub-class of
244+
:class:`pystac.StacIO` (or :class:`pystac.DefaultStacIO`) and customize the methods as
245+
you see fit. You can then pass instances of this custom sub-class into the ``stac_io``
246+
argument of most object-specific I/O methods. You can also use
247+
:meth:`pystac.StacIO.set_default` in your client's ``__init__.py`` file to make this
248+
sub-class the default :class:`pystac.StacIO` implementation throughout the library.
249+
250+
For example, this code will allow
243251
for reading from AWS's S3 cloud object storage using `boto3
244-
<https://boto3.amazonaws.com/v1/documentation/api/latest/index.html>`_:
252+
<https://boto3.amazonaws.com/v1/documentation/api/latest/index.html>`__:
245253

246254
.. code-block:: python
247255
248256
from urllib.parse import urlparse
249257
import boto3
250-
from pystac import STAC_IO
251-
252-
def my_read_method(uri):
253-
parsed = urlparse(uri)
254-
if parsed.scheme == 's3':
255-
bucket = parsed.netloc
256-
key = parsed.path[1:]
257-
s3 = boto3.resource('s3')
258-
obj = s3.Object(bucket, key)
259-
return obj.get()['Body'].read().decode('utf-8')
260-
else:
261-
return STAC_IO.default_read_text_method(uri)
262-
263-
def my_write_method(uri, txt):
264-
parsed = urlparse(uri)
265-
if parsed.scheme == 's3':
266-
bucket = parsed.netloc
267-
key = parsed.path[1:]
268-
s3 = boto3.resource("s3")
269-
s3.Object(bucket, key).put(Body=txt)
270-
else:
271-
STAC_IO.default_write_text_method(uri, txt)
272-
273-
STAC_IO.read_text_method = my_read_method
274-
STAC_IO.write_text_method = my_write_method
275-
276-
If you are only going to read from another source, e.g. HTTP, you could only replace the
277-
read method. For example, using the `requests library
278-
<https://requests.kennethreitz.org/en/master>`_:
258+
from pystac import Link
259+
from pystac.stac_io import DefaultStacIO, StacIO
260+
261+
class CustomStacIO(DefaultStacIO):
262+
def __init__():
263+
self.s3 = boto3.resource("s3")
264+
265+
def read_text(
266+
self, source: Union[str, Link], *args: Any, **kwargs: Any
267+
) -> str:
268+
parsed = urlparse(uri)
269+
if parsed.scheme == "s3":
270+
bucket = parsed.netloc
271+
key = parsed.path[1:]
272+
273+
obj = self.s3.Object(bucket, key)
274+
return obj.get()["Body"].read().decode("utf-8")
275+
else:
276+
return super().read_text(source, *args, **kwargs)
277+
278+
def write_text(
279+
self, dest: Union[str, Link], txt: str, *args: Any, **kwargs: Any
280+
) -> None:
281+
parsed = urlparse(uri)
282+
if parsed.scheme == "s3":
283+
bucket = parsed.netloc
284+
key = parsed.path[1:]
285+
s3 = boto3.resource("s3")
286+
s3.Object(bucket, key).put(Body=txt, ContentEncoding="utf-8")
287+
else:
288+
super().write_text(dest, txt, *args, **kwargs)
289+
290+
StacIO.set_default(CustomStacIO)
291+
292+
293+
If you only need to customize read operations you can inherit from
294+
:class:`~pystac.stac_io.DefaultStacIO` and only overwrite the read method. For example,
295+
to take advantage of connection pooling using a `requests.Session
296+
<https://requests.kennethreitz.org/en/master>`__:
279297

280298
.. code-block:: python
281299
282300
from urllib.parse import urlparse
283301
import requests
284-
from pystac import STAC_IO
285-
286-
def my_read_method(uri):
287-
parsed = urlparse(uri)
288-
if parsed.scheme.startswith('http'):
289-
return requests.get(uri).text
290-
else:
291-
return STAC_IO.default_read_text_method(uri)
292-
293-
STAC_IO.read_text_method = my_read_method
302+
from pystac.stac_io import DefaultStacIO, StacIO
303+
304+
class ConnectionPoolingIO(DefaultStacIO):
305+
def __init__():
306+
self.session = requests.Session()
307+
308+
def read_text(
309+
self, source: Union[str, Link], *args: Any, **kwargs: Any
310+
) -> str:
311+
parsed = urlparse(uri)
312+
if parsed.scheme.startswith("http"):
313+
return self.session.get(uri).text
314+
else:
315+
return super().read_text(source, *args, **kwargs)
316+
317+
StacIO.set_default(ConnectionPoolingIO)
294318
295319
Validation
296320
==========

pystac/stac_io.py

Lines changed: 68 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,19 @@ def read_text(
4646
) -> str:
4747
"""Read text from the given URI.
4848
49-
The source to read from can be specified
50-
as a string or a Link. If it's a string, it's the URL of the HREF from which to
51-
read. When reading links, PySTAC will pass in the entire link body.
52-
This enables implementations to utilize additional link information,
53-
e.g. the "post" information in a pagination link from a STAC API search.
49+
The source to read from can be specified as a string or a
50+
:class:`~pystac.Link`. If it is a string, it must be a URI or local path from
51+
which to read. Using a :class:`~pystac.Link` enables implementations to use
52+
additional link information, such as paging information contained in the
53+
extended links described in the `STAC API spec
54+
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search#paging>`__.
5455
5556
Args:
5657
source : The source to read from.
58+
*args : Arbitrary positional arguments that may be utilized by the concrete
59+
implementation.
60+
**kwargs : Arbitrary keyword arguments that may be utilized by the concrete
61+
implementation.
5762
5863
Returns:
5964
str: The text contained in the file at the location specified by the uri.
@@ -66,10 +71,10 @@ def write_text(
6671
) -> None:
6772
"""Write the given text to a file at the given URI.
6873
69-
The destination to write to from can be specified
70-
as a string or a Link. If it's a string, it's the URL of the HREF from which to
71-
read. When writing based on links links, PySTAC will pass in the entire
72-
link body.
74+
The destination to write to from can be specified as a string or a
75+
:class:`~pystac.Link`. If it is a string, it must be a URI or local path from
76+
which to read. Using a :class:`~pystac.Link` enables implementations to use
77+
additional link information.
7378
7479
Args:
7580
dest : The destination to write to.
@@ -122,6 +127,21 @@ def stac_object_from_dict(
122127
root: Optional["Catalog_Type"] = None,
123128
preserve_dict: bool = True,
124129
) -> "STACObject_Type":
130+
"""Deserializes a :class:`~pystac.STACObject` sub-class instance from a
131+
dictionary.
132+
133+
Args:
134+
135+
d : The dictionary to deserialize
136+
href : Optional href to associate with the STAC object
137+
root : Optional root :class:`~pystac.Catalog` to associate with the
138+
STAC object.
139+
preserve_dict: If ``False``, the dict parameter ``d`` may be modified
140+
during this method call. Otherwise the dict is not mutated.
141+
Defaults to ``True``, which results results in a deepcopy of the
142+
parameter. Set to ``False`` when possible to avoid the performance
143+
hit of a deepcopy.
144+
"""
125145
if identify_stac_object_type(d) == pystac.STACObjectType.ITEM:
126146
collection_cache = None
127147
if root is not None:
@@ -244,18 +264,31 @@ def default(cls) -> "StacIO":
244264

245265
class DefaultStacIO(StacIO):
246266
def read_text(
247-
self, source: Union[str, "Link_Type"], *args: Any, **kwargs: Any
267+
self, source: Union[str, "Link_Type"], *_: Any, **__: Any
248268
) -> str:
269+
"""A concrete implementation of :meth:`StacIO.read_text <pystac.StacIO.read_text>`. Converts the
270+
``source`` argument to a string (if it is not already) and delegates to
271+
:meth:`DefaultStacIO.read_text_from_href` for opening and reading the file."""
249272
href: Optional[str]
250273
if isinstance(source, str):
251274
href = source
252275
else:
253276
href = source.get_absolute_href()
254277
if href is None:
255278
raise IOError(f"Could not get an absolute HREF from link {source}")
256-
return self.read_text_from_href(href, *args, **kwargs)
279+
return self.read_text_from_href(href)
280+
281+
def read_text_from_href(self, href: str) -> str:
282+
"""Reads file as a UTF-8 string.
283+
284+
If ``href`` has a "scheme" (e.g. if it starts with "https://") then this will
285+
use :func:`urllib.request.urlopen` to open the file and read the contents;
286+
otherwise, :func:`open` will be used to open a local file.
287+
288+
Args:
257289
258-
def read_text_from_href(self, href: str, *args: Any, **kwargs: Any) -> str:
290+
href : The URI of the file to open.
291+
"""
259292
parsed = safe_urlparse(href)
260293
href_contents: str
261294
if parsed.scheme != "":
@@ -270,20 +303,33 @@ def read_text_from_href(self, href: str, *args: Any, **kwargs: Any) -> str:
270303
return href_contents
271304

272305
def write_text(
273-
self, dest: Union[str, "Link_Type"], txt: str, *args: Any, **kwargs: Any
306+
self, dest: Union[str, "Link_Type"], txt: str, *_: Any, **__: Any
274307
) -> None:
308+
"""A concrete implementation of :meth:`StacIO.write_text <pystac.StacIO.write_text>`. Converts the
309+
``dest`` argument to a string (if it is not already) and delegates to
310+
:meth:`DefaultStacIO.write_text_from_href` for opening and reading the file."""
275311
href: Optional[str]
276312
if isinstance(dest, str):
277313
href = dest
278314
else:
279315
href = dest.get_absolute_href()
280316
if href is None:
281317
raise IOError(f"Could not get an absolute HREF from link {dest}")
282-
return self.write_text_to_href(href, txt, *args, **kwargs)
318+
return self.write_text_to_href(href, txt)
283319

284320
def write_text_to_href(
285-
self, href: str, txt: str, *args: Any, **kwargs: Any
321+
self, href: str, txt: str
286322
) -> None:
323+
"""Writes text to file using UTF-8 encoding.
324+
325+
This implementation uses :func:`open` and therefore can only write to the local
326+
file system.
327+
328+
Args:
329+
330+
href : The path to which the file will be written.
331+
txt : The string content to write to the file.
332+
"""
287333
dirname = os.path.dirname(href)
288334
if dirname != "" and not os.path.isdir(dirname):
289335
os.makedirs(dirname)
@@ -292,16 +338,16 @@ def write_text_to_href(
292338

293339

294340
class DuplicateKeyReportingMixin(StacIO):
295-
"""A mixin for StacIO implementations that will report
341+
"""A mixin for :class:`pystac.StacIO` implementations that will report
296342
on duplicate keys in the JSON being read in.
297343
298344
See https://github.com/stac-utils/pystac/issues/313
299345
"""
300346

301347
def json_loads(self, txt: str, *_: Any, **__: Any) -> Dict[str, Any]:
302-
"""Overwrites :meth:`StacIO.json_loads` as the internal method used by
303-
:class:`DuplicateKeyReportingMixin` for deserializing a JSON string to a
304-
dictionary while checking for duplicate object keys.
348+
"""Overwrites :meth:`StacIO.json_loads <pystac.StacIO.json_loads>` as the
349+
internal method used by :class:`DuplicateKeyReportingMixin` for deserializing
350+
a JSON string to a dictionary while checking for duplicate object keys.
305351
306352
Raises:
307353
@@ -315,8 +361,9 @@ def json_loads(self, txt: str, *_: Any, **__: Any) -> Dict[str, Any]:
315361
def read_json(
316362
self, source: Union[str, "Link_Type"], *args: Any, **kwargs: Any
317363
) -> Dict[str, Any]:
318-
"""Overwrites :meth:`StacIO.read_json` for deserializing a JSON file to a
319-
dictionary while checking for duplicate object keys.
364+
"""Overwrites :meth:`StacIO.read_json <pystac.StacIO.read_json>` for
365+
deserializing a JSON file to a dictionary while checking for duplicate object
366+
keys.
320367
321368
Raises:
322369

0 commit comments

Comments
 (0)