Skip to content

Feature design - Sensor buddies #96

@wpreimes

Description

@wpreimes

Background

An ISMN data collection contains measurements of multiple sensors, that are - under certain circumstances - expected to be similar. This can be for example measurements of the same variable (e.g., soil moisture or temperature) in a similar soil layer (depth) at the same location (lon/lat), or maybe - depending on the application - even sensors that are located close to each other (e.g. at the same site in different depths, the same field, or the same climate, are expected to share some features (e.g. the signal of precipitation events, season signals, etc.). We call them "sensor buddies". There is currently no clear definition what a "buddy" exactly is, as this depends on the application.
In this task you will implement code to enable/simplify future analyses on ISMN sensor buddies, which is a first step towards mutliple potential improvements.

The ismn github package implements methods to simplify the use of ISMN data in python. A function to create buddy groups in this package would allow, for example to

  • Help inter-compare "similar" sensors and detect potential outliers / malfunctioning sensors from a data collection
  • Enable combining the data from multiple sensors (e.g. to fill data gaps or generally improve the data quality) to end up with a "better" time series
  • Potentially upscale measurements from multiple locations to create a "more representative" time series from point-wise in situ measurements for soil moisture dynamics at the satellite scale (multiple kilometres)

Task description

The task is to implement methods, probably to the Sensor class (e.g. Sensor.is_buddy_sensor(self, other: Sensor) -> bool) and/or ISMN_Interface class (e.g. ISMN_Interface.find_buddies_for_sensor(candidate_id, <SELECTION KWARGS>) -> list[Sensor], that allow creating groups of "similar" sensors (providing sufficient (initial) options for a user to define what "similar" means for their application). This should done within the scope of the available metadata that is provided for each ISMN sensor (e.g. in terms of their location or distance from each other, their measuring depth and maybe some of the metadata classes, e.g. measurement in the same climatic regions)

The focus on this task is on the initial implementation of such a feature and not on various defintitions of a sensor buddy.

Mockup examples

This is how the function could be used

>> from ismn.interface import ISMN_Interface
>> ismn = ISMN_Interface("/path/to/downloaded/ismn/data")
>> ismn.find_buddies(sensor_id=1, max_dist_m=100, depth_overlap_m=0.1, same_climate=True, ...)
[2, 3, 4]   # sensor IDs of buddy sensors in the collection of sensor with ID 1 that fulfill the chosen criteria

or also

# 2 soil moisture sensors from different stations of the same network
>> sensor1 = ismn['REMEDHUS']['Canizal']["Stevens-Hydra-Probe_soil_moisture_0.000000_0.050000"]
>> sensor2 = ismn['REMEDHUS']['Carretoro']["Stevens-Hydra-Probe_soil_moisture_0.000000_0.050000"]
# calling the new function to test
>> sensor1.has_buddy(sensor2, max_dist=10, depth_overlap_percent=100, ...)
True 

kwargs should be implemented in a way to allow the user to define meaningful buddies (e.g. via the distance between sensors, or other sensor metadata). Other "buddy" criteria should be forced if it makes sense (e.g. only a candidate sensor that measures the same variable as the reference sensor can be a buddy).

The return value is probably either a list of Sensor objects or a list of sensor IDs. In any case, the information should allow the user to subsequently load the buddy data for further processing (e.g. via the ISMN_Interface.read_ts method)

Tools

The following tools should be sufficient to implement this feature

  1. The sensor metadata table (which contains the metadata information (location, environment parameters, etc.) to test whether a sensor should be classified as a "buddy" or not based on the user requirements as well as the sensor IDs (index) and sensor/instrument depth as pandas data frame.
>> ismn.metadata.dropna(how='all', axis=1)
   
variable clay_fraction                climate_KG climate_insitu elevation            frm_class frm_nobs   frm_snr     idx instrument                                     lat  latitude lc_2000 lc_2005 lc_2010 lc_insitu      lon longitude   network organic_carbon                sand_fraction                saturation                silt_fraction                          station      timerange_from        timerange_to   variable                                                                     file_path      file_type
key         depth_from depth_to   val        val            val       val                  val      val       val     val depth_from depth_to                  val       val       val     val     val     val       val      val       val       val     depth_from depth_to   val    depth_from depth_to   val depth_from depth_to   val    depth_from depth_to   val               val                 val                 val depth_from depth_to               val                                                val            val
0                  0.0      0.3  18.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.19603      20      20      20   unknown      NaN  -5.35997  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0           Canizal 2007-05-17 15:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/Canizal/REMEDHUS_REMEDHUS_Canizal_ts_...  header_values
1                  0.0      0.3  18.0        BSk        unknown     -99.9   not representative   4287.0 -3.797824  2219.0        0.0     0.05  Stevens-Hydra-Probe  41.19603  41.19603      20      20      20   unknown -5.35997  -5.35997  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0           Canizal 2007-05-17 15:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/Canizal/REMEDHUS_REMEDHUS_Canizal_sm_...  header_values
2                  0.0      0.3  21.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.31243      20      20      20   unknown      NaN  -5.16140  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0       Carramedina 2005-03-16 14:00:00 2010-12-31 23:00:00        0.0     0.05  soil_temperature  REMEDHUS/Carramedina/REMEDHUS_REMEDHUS_Carrame...  header_values
3                  0.0      0.3  21.0        BSk        unknown     -99.9   not representative   1738.0 -2.666113  2220.0        0.0     0.05  Stevens-Hydra-Probe  41.31243  41.31243      20      20      20   unknown -5.16140  -5.16140  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0       Carramedina 2005-03-16 14:00:00 2010-12-31 23:00:00        0.0     0.05     soil_moisture  REMEDHUS/Carramedina/REMEDHUS_REMEDHUS_Carrame...  header_values
4                  0.0      0.3  18.0        BSk        unknown     -99.9  very representative   5463.0  3.064744  2221.0        0.0     0.05  Stevens-Hydra-Probe  41.26504  41.26504      10      10      10   unknown -5.38049  -5.38049  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0         Carretoro 2005-03-15 19:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/Carretoro/REMEDHUS_REMEDHUS_Carretoro...  header_values
5                  0.0      0.3  18.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.26504      10      10      10   unknown      NaN  -5.38049  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0         Carretoro 2005-03-15 19:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/Carretoro/REMEDHUS_REMEDHUS_Carretoro...  header_values
6                  0.0      0.3  49.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.23432      10      10      10   unknown      NaN  -5.47197  REMEDHUS            0.0      0.3  0.87           0.0      0.3  19.0        0.0      0.3  0.50           0.0      0.3  32.0       CasaGorrizo 2005-03-15 18:00:00 2007-05-22 07:00:00        0.0     0.05  soil_temperature  REMEDHUS/CasaGorrizo/REMEDHUS_REMEDHUS_CasaGor...  header_values
7                  0.0      0.3  49.0        BSk        unknown     -99.9  very representative    682.0  5.609561  2222.0        0.0     0.05  Stevens-Hydra-Probe  41.23432  41.23432      10      10      10   unknown -5.47197  -5.47197  REMEDHUS            0.0      0.3  0.87           0.0      0.3  19.0        0.0      0.3  0.50           0.0      0.3  32.0       CasaGorrizo 2005-03-15 18:00:00 2007-05-22 07:00:00        0.0     0.05     soil_moisture  REMEDHUS/CasaGorrizo/REMEDHUS_REMEDHUS_CasaGor...  header_values
8                  0.0      0.3  21.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.39392      10      10      10   unknown      NaN  -5.32146  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0       CasaPeriles 2005-03-22 11:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/CasaPeriles/REMEDHUS_REMEDHUS_CasaPer...  header_values
9                  0.0      0.3  21.0        BSk        unknown     -99.9       representative   5139.0  2.401900  2223.0        0.0     0.05  Stevens-Hydra-Probe  41.39392  41.39392      10      10      10   unknown -5.32146  -5.32146  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0       CasaPeriles 2005-03-22 11:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/CasaPeriles/REMEDHUS_REMEDHUS_CasaPer...  header_values
10                 0.0      0.3  21.0        BSk        unknown     -99.9       representative   5314.0  2.070086  2224.0        0.0     0.05  Stevens-Hydra-Probe  41.30010  41.30010      10      10      10   unknown -5.24704  -5.24704  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0   ConcejodelMonte 2005-03-16 15:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/ConcejodelMonte/REMEDHUS_REMEDHUS_Con...  header_values
11                 0.0      0.3  21.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.30010      10      10      10   unknown      NaN  -5.24704  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0   ConcejodelMonte 2005-03-16 15:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/ConcejodelMonte/REMEDHUS_REMEDHUS_Con...  header_values
12                 0.0      0.3  18.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.38134      10      10      10   unknown      NaN  -5.42922  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0            ElCoto 2005-04-01 12:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/ElCoto/REMEDHUS_REMEDHUS_ElCoto_ts_0....  header_values
13                 0.0      0.3  18.0        BSk        unknown     -99.9   not representative   5421.0 -3.065143  2225.0        0.0     0.05  Stevens-Hydra-Probe  41.38134  41.38134      10      10      10   unknown -5.42922  -5.42922  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0            ElCoto 2005-04-01 12:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/ElCoto/REMEDHUS_REMEDHUS_ElCoto_sm_0....  header_values
14                 0.0      0.3  49.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.34888      10      10      10   unknown      NaN  -5.49027  REMEDHUS            0.0      0.3  0.87           0.0      0.3  19.0        0.0      0.3  0.50           0.0      0.3  32.0        ElTomillar 2009-01-01 00:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/ElTomillar/REMEDHUS_REMEDHUS_ElTomill...  header_values
15                 0.0      0.3  49.0        BSk        unknown     -99.9   not representative   4063.0 -2.683107  2226.0        0.0     0.05  Stevens-Hydra-Probe  41.34888  41.34888      10      10      10   unknown -5.49027  -5.49027  REMEDHUS            0.0      0.3  0.87           0.0      0.3  19.0        0.0      0.3  0.50           0.0      0.3  32.0        ElTomillar 2009-01-01 00:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/ElTomillar/REMEDHUS_REMEDHUS_ElTomill...  header_values
16                 0.0      0.3  18.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.30582      20      20      20   unknown      NaN  -5.37566  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0          Granja-g 2005-03-17 17:00:00 2024-01-01 00:00:00        0.0     0.05  soil_temperature  REMEDHUS/Granja-g/REMEDHUS_REMEDHUS_Granja-g_t...  header_values
17                 0.0      0.3  18.0        BSk        unknown     -99.9   not representative   5420.0 -1.707507  2227.0        0.0     0.05  Stevens-Hydra-Probe  41.30582  41.30582      20      20      20   unknown -5.37566  -5.37566  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0          Granja-g 2005-03-17 17:00:00 2024-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/Granja-g/REMEDHUS_REMEDHUS_Granja-g_s...  header_values
18                 0.0      0.3  21.0        BSk        unknown     -99.9       representative    675.0  2.512931  2228.0        0.0     0.05  Stevens-Hydra-Probe  41.46426  41.46426      10      10      10   unknown -5.44884  -5.44884  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0    GranjaToresana 2005-04-14 18:00:00 2007-05-16 12:00:00        0.0     0.05     soil_moisture  REMEDHUS/GranjaToresana/REMEDHUS_REMEDHUS_Gran...  header_values
19                 0.0      0.3  21.0        BSk        unknown     -99.9          undeducible      NaN       NaN     NaN        0.0     0.05  Stevens-Hydra-Probe       NaN  41.46426      10      10      10   unknown      NaN  -5.44884  REMEDHUS            0.0      0.3  0.65           0.0      0.3  36.0        0.0      0.3  0.43           0.0      0.3  43.0    GranjaToresana 2005-04-14 18:00:00 2007-05-16 12:00:00        0.0     0.05  soil_temperature  REMEDHUS/GranjaToresana/REMEDHUS_REMEDHUS_Gran...  header_values
20                 0.0      0.3  18.0        BSk        unknown     -99.9       representative   1767.0  1.384542  2229.0        0.0     0.05  Stevens-Hydra-Probe  41.20048  41.20048      10      10      10   unknown -5.29738  -5.29738  REMEDHUS            0.0      0.3  0.60           0.0      0.3  34.0        0.0      0.3  0.42           0.0      0.3  48.0           Guarena 2005-03-22 17:00:00 2013-01-01 00:00:00        0.0     0.05     soil_moisture  REMEDHUS/Guarena/REMEDHUS_REMEDHUS_Guarena_sm_...  header_values

...
  1. The tutorial on using the ISMN package in general.
  2. The package documentation and metadata description table
  3. Data for testing can be downloaded from https://ismn.earth/en/dataviewer/ after registration, or can be provided by the task supervisor.

Applying the new code

While the focus here is not on coming up with a clear definition of a "buddy" for various applications, it is important to showcase the new feature. Therefore, implement a simple "buddy comparsion" (e.g. as ipython notebook). Choose one or multiple available ISMN sensors, and find its buddy sensors (based on your definition). Use the information to load the data of all buddies, then compute some metrics between the time series (e.g. correlation) and visualise your results. e.g. as a heatmap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions