-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Background
An ISMN data collection contains measurements of multiple sensors, that are - under certain circumstances - expected to be similar. This can be for example measurements of the same variable (e.g., soil moisture or temperature) in a similar soil layer (depth) at the same location (lon/lat), or maybe - depending on the application - even sensors that are located close to each other (e.g. at the same site in different depths, the same field, or the same climate, are expected to share some features (e.g. the signal of precipitation events, season signals, etc.). We call them "sensor buddies". There is currently no clear definition what a "buddy" exactly is, as this depends on the application.
In this task you will implement code to enable/simplify future analyses on ISMN sensor buddies, which is a first step towards mutliple potential improvements.
The ismn github package implements methods to simplify the use of ISMN data in python. A function to create buddy groups in this package would allow, for example to
- Help inter-compare "similar" sensors and detect potential outliers / malfunctioning sensors from a data collection
- Enable combining the data from multiple sensors (e.g. to fill data gaps or generally improve the data quality) to end up with a "better" time series
- Potentially upscale measurements from multiple locations to create a "more representative" time series from point-wise in situ measurements for soil moisture dynamics at the satellite scale (multiple kilometres)
Task description
The task is to implement methods, probably to the Sensor class (e.g. Sensor.is_buddy_sensor(self, other: Sensor) -> bool) and/or ISMN_Interface class (e.g. ISMN_Interface.find_buddies_for_sensor(candidate_id, <SELECTION KWARGS>) -> list[Sensor], that allow creating groups of "similar" sensors (providing sufficient (initial) options for a user to define what "similar" means for their application). This should done within the scope of the available metadata that is provided for each ISMN sensor (e.g. in terms of their location or distance from each other, their measuring depth and maybe some of the metadata classes, e.g. measurement in the same climatic regions)
The focus on this task is on the initial implementation of such a feature and not on various defintitions of a sensor buddy.
Mockup examples
This is how the function could be used
>> from ismn.interface import ISMN_Interface
>> ismn = ISMN_Interface("/path/to/downloaded/ismn/data")
>> ismn.find_buddies(sensor_id=1, max_dist_m=100, depth_overlap_m=0.1, same_climate=True, ...)
[2, 3, 4] # sensor IDs of buddy sensors in the collection of sensor with ID 1 that fulfill the chosen criteria
or also
# 2 soil moisture sensors from different stations of the same network
>> sensor1 = ismn['REMEDHUS']['Canizal']["Stevens-Hydra-Probe_soil_moisture_0.000000_0.050000"]
>> sensor2 = ismn['REMEDHUS']['Carretoro']["Stevens-Hydra-Probe_soil_moisture_0.000000_0.050000"]
# calling the new function to test
>> sensor1.has_buddy(sensor2, max_dist=10, depth_overlap_percent=100, ...)
True
kwargs should be implemented in a way to allow the user to define meaningful buddies (e.g. via the distance between sensors, or other sensor metadata). Other "buddy" criteria should be forced if it makes sense (e.g. only a candidate sensor that measures the same variable as the reference sensor can be a buddy).
The return value is probably either a list of Sensor objects or a list of sensor IDs. In any case, the information should allow the user to subsequently load the buddy data for further processing (e.g. via the ISMN_Interface.read_ts method)
Tools
The following tools should be sufficient to implement this feature
- The sensor metadata table (which contains the metadata information (location, environment parameters, etc.) to test whether a sensor should be classified as a "buddy" or not based on the user requirements as well as the sensor IDs (index) and sensor/instrument depth as pandas data frame.
>> ismn.metadata.dropna(how='all', axis=1)
variable clay_fraction climate_KG climate_insitu elevation frm_class frm_nobs frm_snr idx instrument lat latitude lc_2000 lc_2005 lc_2010 lc_insitu lon longitude network organic_carbon sand_fraction saturation silt_fraction station timerange_from timerange_to variable file_path file_type
key depth_from depth_to val val val val val val val val depth_from depth_to val val val val val val val val val val depth_from depth_to val depth_from depth_to val depth_from depth_to val depth_from depth_to val val val val depth_from depth_to val val val
0 0.0 0.3 18.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.19603 20 20 20 unknown NaN -5.35997 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Canizal 2007-05-17 15:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/Canizal/REMEDHUS_REMEDHUS_Canizal_ts_... header_values
1 0.0 0.3 18.0 BSk unknown -99.9 not representative 4287.0 -3.797824 2219.0 0.0 0.05 Stevens-Hydra-Probe 41.19603 41.19603 20 20 20 unknown -5.35997 -5.35997 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Canizal 2007-05-17 15:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/Canizal/REMEDHUS_REMEDHUS_Canizal_sm_... header_values
2 0.0 0.3 21.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.31243 20 20 20 unknown NaN -5.16140 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 Carramedina 2005-03-16 14:00:00 2010-12-31 23:00:00 0.0 0.05 soil_temperature REMEDHUS/Carramedina/REMEDHUS_REMEDHUS_Carrame... header_values
3 0.0 0.3 21.0 BSk unknown -99.9 not representative 1738.0 -2.666113 2220.0 0.0 0.05 Stevens-Hydra-Probe 41.31243 41.31243 20 20 20 unknown -5.16140 -5.16140 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 Carramedina 2005-03-16 14:00:00 2010-12-31 23:00:00 0.0 0.05 soil_moisture REMEDHUS/Carramedina/REMEDHUS_REMEDHUS_Carrame... header_values
4 0.0 0.3 18.0 BSk unknown -99.9 very representative 5463.0 3.064744 2221.0 0.0 0.05 Stevens-Hydra-Probe 41.26504 41.26504 10 10 10 unknown -5.38049 -5.38049 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Carretoro 2005-03-15 19:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/Carretoro/REMEDHUS_REMEDHUS_Carretoro... header_values
5 0.0 0.3 18.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.26504 10 10 10 unknown NaN -5.38049 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Carretoro 2005-03-15 19:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/Carretoro/REMEDHUS_REMEDHUS_Carretoro... header_values
6 0.0 0.3 49.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.23432 10 10 10 unknown NaN -5.47197 REMEDHUS 0.0 0.3 0.87 0.0 0.3 19.0 0.0 0.3 0.50 0.0 0.3 32.0 CasaGorrizo 2005-03-15 18:00:00 2007-05-22 07:00:00 0.0 0.05 soil_temperature REMEDHUS/CasaGorrizo/REMEDHUS_REMEDHUS_CasaGor... header_values
7 0.0 0.3 49.0 BSk unknown -99.9 very representative 682.0 5.609561 2222.0 0.0 0.05 Stevens-Hydra-Probe 41.23432 41.23432 10 10 10 unknown -5.47197 -5.47197 REMEDHUS 0.0 0.3 0.87 0.0 0.3 19.0 0.0 0.3 0.50 0.0 0.3 32.0 CasaGorrizo 2005-03-15 18:00:00 2007-05-22 07:00:00 0.0 0.05 soil_moisture REMEDHUS/CasaGorrizo/REMEDHUS_REMEDHUS_CasaGor... header_values
8 0.0 0.3 21.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.39392 10 10 10 unknown NaN -5.32146 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 CasaPeriles 2005-03-22 11:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/CasaPeriles/REMEDHUS_REMEDHUS_CasaPer... header_values
9 0.0 0.3 21.0 BSk unknown -99.9 representative 5139.0 2.401900 2223.0 0.0 0.05 Stevens-Hydra-Probe 41.39392 41.39392 10 10 10 unknown -5.32146 -5.32146 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 CasaPeriles 2005-03-22 11:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/CasaPeriles/REMEDHUS_REMEDHUS_CasaPer... header_values
10 0.0 0.3 21.0 BSk unknown -99.9 representative 5314.0 2.070086 2224.0 0.0 0.05 Stevens-Hydra-Probe 41.30010 41.30010 10 10 10 unknown -5.24704 -5.24704 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 ConcejodelMonte 2005-03-16 15:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/ConcejodelMonte/REMEDHUS_REMEDHUS_Con... header_values
11 0.0 0.3 21.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.30010 10 10 10 unknown NaN -5.24704 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 ConcejodelMonte 2005-03-16 15:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/ConcejodelMonte/REMEDHUS_REMEDHUS_Con... header_values
12 0.0 0.3 18.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.38134 10 10 10 unknown NaN -5.42922 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 ElCoto 2005-04-01 12:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/ElCoto/REMEDHUS_REMEDHUS_ElCoto_ts_0.... header_values
13 0.0 0.3 18.0 BSk unknown -99.9 not representative 5421.0 -3.065143 2225.0 0.0 0.05 Stevens-Hydra-Probe 41.38134 41.38134 10 10 10 unknown -5.42922 -5.42922 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 ElCoto 2005-04-01 12:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/ElCoto/REMEDHUS_REMEDHUS_ElCoto_sm_0.... header_values
14 0.0 0.3 49.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.34888 10 10 10 unknown NaN -5.49027 REMEDHUS 0.0 0.3 0.87 0.0 0.3 19.0 0.0 0.3 0.50 0.0 0.3 32.0 ElTomillar 2009-01-01 00:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/ElTomillar/REMEDHUS_REMEDHUS_ElTomill... header_values
15 0.0 0.3 49.0 BSk unknown -99.9 not representative 4063.0 -2.683107 2226.0 0.0 0.05 Stevens-Hydra-Probe 41.34888 41.34888 10 10 10 unknown -5.49027 -5.49027 REMEDHUS 0.0 0.3 0.87 0.0 0.3 19.0 0.0 0.3 0.50 0.0 0.3 32.0 ElTomillar 2009-01-01 00:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/ElTomillar/REMEDHUS_REMEDHUS_ElTomill... header_values
16 0.0 0.3 18.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.30582 20 20 20 unknown NaN -5.37566 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Granja-g 2005-03-17 17:00:00 2024-01-01 00:00:00 0.0 0.05 soil_temperature REMEDHUS/Granja-g/REMEDHUS_REMEDHUS_Granja-g_t... header_values
17 0.0 0.3 18.0 BSk unknown -99.9 not representative 5420.0 -1.707507 2227.0 0.0 0.05 Stevens-Hydra-Probe 41.30582 41.30582 20 20 20 unknown -5.37566 -5.37566 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Granja-g 2005-03-17 17:00:00 2024-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/Granja-g/REMEDHUS_REMEDHUS_Granja-g_s... header_values
18 0.0 0.3 21.0 BSk unknown -99.9 representative 675.0 2.512931 2228.0 0.0 0.05 Stevens-Hydra-Probe 41.46426 41.46426 10 10 10 unknown -5.44884 -5.44884 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 GranjaToresana 2005-04-14 18:00:00 2007-05-16 12:00:00 0.0 0.05 soil_moisture REMEDHUS/GranjaToresana/REMEDHUS_REMEDHUS_Gran... header_values
19 0.0 0.3 21.0 BSk unknown -99.9 undeducible NaN NaN NaN 0.0 0.05 Stevens-Hydra-Probe NaN 41.46426 10 10 10 unknown NaN -5.44884 REMEDHUS 0.0 0.3 0.65 0.0 0.3 36.0 0.0 0.3 0.43 0.0 0.3 43.0 GranjaToresana 2005-04-14 18:00:00 2007-05-16 12:00:00 0.0 0.05 soil_temperature REMEDHUS/GranjaToresana/REMEDHUS_REMEDHUS_Gran... header_values
20 0.0 0.3 18.0 BSk unknown -99.9 representative 1767.0 1.384542 2229.0 0.0 0.05 Stevens-Hydra-Probe 41.20048 41.20048 10 10 10 unknown -5.29738 -5.29738 REMEDHUS 0.0 0.3 0.60 0.0 0.3 34.0 0.0 0.3 0.42 0.0 0.3 48.0 Guarena 2005-03-22 17:00:00 2013-01-01 00:00:00 0.0 0.05 soil_moisture REMEDHUS/Guarena/REMEDHUS_REMEDHUS_Guarena_sm_... header_values
...
- The tutorial on using the ISMN package in general.
- The package documentation and metadata description table
- Data for testing can be downloaded from https://ismn.earth/en/dataviewer/ after registration, or can be provided by the task supervisor.
Applying the new code
While the focus here is not on coming up with a clear definition of a "buddy" for various applications, it is important to showcase the new feature. Therefore, implement a simple "buddy comparsion" (e.g. as ipython notebook). Choose one or multiple available ISMN sensors, and find its buddy sensors (based on your definition). Use the information to load the data of all buddies, then compute some metrics between the time series (e.g. correlation) and visualise your results. e.g. as a heatmap