[skip-ci] Small updates to IO docs. (pydata#8452)

dcherian · web-flow · commit 53551666d9c6 · 2023-11-16T08:19:56.000-07:00
* [skip-ci] Small updates to IO docs.

* [skip-ci] Whats new
diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
@@ -44,9 +44,9 @@ __ https://www.unidata.ucar.edu/software/netcdf/
 
 .. _netCDF FAQ: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF
 
-Reading and writing netCDF files with xarray requires scipy or the
-`netCDF4-Python`__ library to be installed (the latter is required to
-read/write netCDF V4 files and use the compression options described below).
+Reading and writing netCDF files with xarray requires scipy, h5netcdf, or the
+`netCDF4-Python`__ library to be installed. SciPy only supports reading and writing
+of netCDF V3 files.
 
 __ https://github.com/Unidata/netcdf4-python
 
@@ -675,8 +675,8 @@ the same as the one that was saved.
 
 .. note::
 
-    xarray does not write NCZarr attributes. Therefore, NCZarr data must be
-    opened in read-only mode.
+    xarray does not write `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html>`_ attributes.
+    Therefore, NCZarr data must be opened in read-only mode.
 
 To store variable length strings, convert them to object arrays first with
 ``dtype=object``.
@@ -696,10 +696,10 @@ It is possible to read and write xarray datasets directly from / to cloud
 storage buckets using zarr. This example uses the `gcsfs`_ package to provide
 an interface to `Google Cloud Storage`_.
 
-From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
-automatically when reading, such that you can open a dataset in a single
-call. You should include any arguments to the storage backend as the
-key ``storage_options``, part of ``backend_kwargs``.
+General `fsspec`_ URLs, those that begin with ``s3://`` or ``gcs://`` for example,
+are parsed and the store set up for you automatically when reading.
+You should include any arguments to the storage backend as the
+key ```storage_options``, part of ``backend_kwargs``.
 
 .. code:: python
 
@@ -715,7 +715,7 @@ key ``storage_options``, part of ``backend_kwargs``.
 This also works with ``open_mfdataset``, allowing you to pass a list of paths or
 a URL to be interpreted as a glob string.
 
-For older versions, and for writing, you must explicitly set up a ``MutableMapping``
+For writing, you must explicitly set up a ``MutableMapping``
 instance and pass this, as follows:
 
 .. code:: python
@@ -769,10 +769,10 @@ Consolidated Metadata
 ~~~~~~~~~~~~~~~~~~~~~
 
 Xarray needs to read all of the zarr metadata when it opens a dataset.
-In some storage mediums, such as with cloud object storage (e.g. amazon S3),
+In some storage mediums, such as with cloud object storage (e.g. `Amazon S3`_),
 this can introduce significant overhead, because two separate HTTP calls to the
 object store must be made for each variable in the dataset.
-As of xarray version 0.18, xarray by default uses a feature called
+By default Xarray uses a feature called
 *consolidated metadata*, storing all metadata for the entire dataset with a
 single key (by default called ``.zmetadata``). This typically drastically speeds
 up opening the store. (For more information on this feature, consult the
@@ -796,16 +796,20 @@ reads. Because this fall-back option is so much slower, xarray issues a
 
 .. _io.zarr.appending:
 
-Appending to existing Zarr stores
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Modifying existing Zarr stores
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Xarray supports several ways of incrementally writing variables to a Zarr
 store. These options are useful for scenarios when it is infeasible or
 undesirable to write your entire dataset at once.
 
+1. Use ``mode='a'`` to add or overwrite entire variables,
+2. Use ``append_dim`` to resize and append to exiting variables, and
+3. Use ``region`` to write to limited regions of existing arrays.
+
 .. tip::
 
-    If you can load all of your data into a single ``Dataset`` using dask, a
+    For ``Dataset`` objects containing dask arrays, a
     single call to ``to_zarr()`` will write all of your data in parallel.
 
 .. warning::
@@ -876,8 +880,8 @@ and then calling ``to_zarr`` with ``compute=False`` to write only metadata
     ds.to_zarr(path, compute=False)
 
 Now, a Zarr store with the correct variable shapes and attributes exists that
-can be filled out by subsequent calls to ``to_zarr``. ``region`` can be
-specified as ``"auto"``, which opens the existing store and determines the
+can be filled out by subsequent calls to ``to_zarr``.
+Setting ``region="auto"`` will open the existing store and determine the
 correct alignment of the new data with the existing coordinates, or as an
 explicit mapping from dimension names to Python ``slice`` objects indicating
 where the data should be written (in index space, not label space), e.g.,
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -37,7 +37,7 @@ Breaking changes
 ~~~~~~~~~~~~~~~~
 - drop support for `cdms2 <https://github.com/CDAT/cdms>`_. Please use
   `xcdat <https://github.com/xCDAT/xcdat>`_ instead (:pull:`8441`).
-  By `Justus Magin <https://github.com/keewis`_.
+  By `Justus Magin <https://github.com/keewis>`_.
 
 - Following pandas, :py:meth:`infer_freq` will return ``"Y"``, ``"YS"``,
   ``"QE"``, ``"ME"``, ``"h"``, ``"min"``, ``"s"``, ``"ms"``, ``"us"``, or
@@ -94,6 +94,8 @@ Bug fixes
 
 Documentation
 ~~~~~~~~~~~~~
+- Small updates to documentation on distributed writes: See :ref:`io.zarr.appending` to Zarr.
+  By `Deepak Cherian <https://github.com/dcherian>`_.
 
 
 Internal Changes