Skip to content

Xarray's explicit indexes #35

@Huite

Description

@Huite

For examples, see:

https://github.com/martinfleis/xvec
https://github.com/dcherian/crsindex/blob/main/crsindex.ipynb

They can be set with:
https://docs.xarray.dev/en/stable/generated/xarray.DataArray.set_xindex.html

One issue is that in the UGRID conventions, the connectivity arrays are 2D. So the first dimension matches whatever's going on in the data (node, edge, face), but the second doesn't. This can be avoided with a trick using numpy structured arrays:

import numpy as np
import xarray as xr

ds = xr.open_dataset("Grevelingen_FM_grid_20190603_0001_net.nc")
faces2d = ds["NetElemNode"].values.astype(int) - 1  # face node_connectivity
triangle = np.dtype([("0", int), ("1", int), ("2", int)])
faces = np.empty(len(faces2d), dtype=triangle)
faces.view(int)[:] = faces2d.ravel()
ds = ds.assign_coords(faces=("nNetElem", faces))

This feels like a hack, and doesn't seem like a serious option.
An alternative approach is perhaps using a Pandas ExtensionArray. This seems like a lot of work: https://stackoverflow.com/questions/68893521/simple-example-of-pandas-extensionarray

Next, we can define a UgridIndex:

class Ugrid2dIndex(Index):
    def __init__(
        self,
        node_x,
        node_y,
        faces,
    ):
        self._ugrid = xu.Ugrid2d(node_x.values, node_y.values, -1, faces.values.view(int).reshape((-1, 3)))

    @classmethod
    def from_variables(cls, variables, options):
        return cls(
            variables["NetNode_x"],
            variables["NetNode_y"],
            variables["faces"],
        )

This seems to do the trick:

newds = ds.set_xindex(
    (
        "NetNode_x",
        "NetNode_y",
        "faces",
    ),
    Ugrid2dIndex,
)

Now we could start adding methods to the Ugrid2dIndex and make it possible to select values with sel. Or alternatively provide other operations via a .ugrid accessor.

However, it doesn't solve all problems that we encounter with UGRID topologies (?):

  • Node, edge, and face dimension are fully separated according to xarray's data model. Selecting one but not the other will generally invalidate the data <-> topology link. (So solve via a .ugrid.sel and .ugrid.isel instead.)
  • A full UGRID topology requires possibly connectivities with faces, edges, nodes, and possible interrelations. As far I can tell, the index is not carried around for every variable. (Solve by using multiple indexes internally containing full grid? One for faces, one for edges (with edge node connectivity). But seems impractical for data on nodes?)

I think the fundamental question is whether a UgridDataset is properly considered as an xarray Dataset: I don't think so, as it breaks a premise of fully separated, orthogonal dimensions. From that perspective, it may be better to accept this, and special case the UGRID topology dimensions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions