Skip to content

[Proposal] Allow to filter for xarray coordinates in EDR data sets #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

JohannesSchnell
Copy link
Contributor

@JohannesSchnell JohannesSchnell commented May 5, 2025

Overview

Hello,
xarray's data structure consists of Dimensions, Coordinates and Data variables, so far we can select Data variables using the parameter-value in the position/ cube request for EDR data. However there also can be multiple Coordinates (apart from time, lat/lon which we also need for a valid collection definition in the config.yaml eg in time_fields). In said case we would get a flattened list. Yet we neither have all information about how many coordinate dimensions have been used before flattening the list, nor their length to reshape the flattened list into its corresponding (multi dimensional) array.

To address this we could implement another query parameter called dims that uses key:value - pairs to specifically select xarray's Coordinates: dims=coord1:value1,coord2:value2,...,coordN:valueN allowing only one value for each coord and a coord beeing selected only once.

In this PR I implemented:

  • the new query parameter dims
  • the logic to select from the dataset's coordinates
  • adding a new filter_dims object to the response of GET /collections/{collection_name} - metadata

Topics TBD:

  • define a schema for the new dims query parameter
  • implement it to appear in the openapi/swagger docs
  • writing tests/ adding new test data

Related Issue / discussion

Additional information

For testing you can create a .nc file like so:

import xarray as xr
import numpy as np
import pandas as pd

# Create dimension values
lat = np.linspace(-90, 90, 10)
lon = np.linspace(-180, 180, 10)
time = pd.date_range("2020-01-01", periods=10, freq='D')
model = [f"hg_{i}" for i in range(1, 6)]
epoche = np.arange(1, 51)

# Create a random temp variable with shape: (time, model, epoche, lat, lon)
np.random.seed(123)
temp_data = np.random.rand(len(time), len(model), len(epoche), len(lat), len(lon))

# Create the dataset
ds = xr.Dataset(
    {
        "temp": (["time", "model", "epoche", "lat", "lon"], temp_data)
    },
    coords={
        "lat": lat,
        "lon": lon,
        "time": time,
        "model": model,
        "epoche": epoche
    }
)

ds.to_netcdf('./dim.nc')

and add the following entry to your config.yaml after mounting/ copy dim.nc into the correct dir:

  dim:
    description: dims
    extents:
      spatial:
        bbox:
        - -179.75
        - -89.75
        - 179.75
        - 89.75
        crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
      temporal:
        begin: 2020-01-01 00:00:00+00:00
        end: 2020-01-10 00:00:00+00:00
    keywords:
    - country
    providers:
    - data: /pygeoapi/data/dim.nc
      format:
        mimetype: application/x-netcdf
        name: netcdf
      name: xarray-edr
      time_field: time
      type: edr
      x_field: lon
      y_field: lat
    title: dim
    type: collection

A request would then look like this:

http://localhost:5000/collections/dim/position?f=json&coords=POINT(5 52)&parameter-name=temp&dims=epoche:3,model:hg_5

http://localhost:5000/collections/dim/position?f=json&coords=POINT(5%2052)&parameter-name=temp&dims=epoche%3A3,model%3Ahg_5

Dependency policy (RFC2)

  • I have ensured that this PR meets RFC2 requirements

Updates to public demo

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
  • I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

@JohannesSchnell JohannesSchnell force-pushed the dim_select_xarray_prov branch from 19ae672 to 9549357 Compare May 5, 2025 14:35
@JohannesSchnell JohannesSchnell force-pushed the dim_select_xarray_prov branch from 9549357 to cbd86f0 Compare May 5, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant