Skip to content

Commit 5b3acae

Browse files
andreaspaulingAndreas Pauling
andauthored
Relax obs requirements (#21)
* Up to 4 stations are allowed to be missing * max_miss_stns configurable * Update README.md * update utils.py * update utils.py * small reformatting --------- Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch>
1 parent 96e6bde commit 5b3acae

File tree

8 files changed

+105
-24
lines changed

8 files changed

+105
-24
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ const_file : <path>/CLON_CLAT_ICON-CH1.gb2
110110
station_obs_file : <path>/pollen_measured_2024020118.atab
111111
station_mod_file : <path>/pollen_modelled_2024020118.atab
112112
hour_incr : 1
113+
max_miss_stns : 4
113114
```
114115
`pov_infile`: This GRIB2 file must include the fields `tthrs`, `tthre` (for POAC, `saisl` instead), `saisn` and `ctsum` if the module `update_phenology` is called. If the module `update_strength` is called `pov_infile` must include the fields `saisn` and `tune`. If at least one of these mandatory fields is missing the package exits with status 1 and tells the user. `pov_infile` is used as template for `pov_outfile`, i.e. the whole file is copied to `pov_outfile` with adapted values. Date and time information of `pov_infile` does not have to be correct, ICON just throws warnings.
115116

@@ -119,6 +120,10 @@ hour_incr : 1
119120
`station_obs_file`: Observed hourly pollen concentrations (ATAB format) of the latest 120 hours relative to the target date of `pov_outfile`. The timestamps of the data in this file may vary depending on data availability, time of extraction etc. Missing values are allowed but at least 50% of each station must be there. If not, the package exits with status 1 and tells the user.
120121
`station_mod_file`: Modelled hourly pollen concentrations (ATAB format) of the latest 120 hours relative to the target date of `pov_outfile`. The timestamps of the data in this file may vary depending on data availability, time of extraction etc. In case of missing values the package exits with status 1 and tells the user. Same stations as in `station_obs_file` (only used if the module `update_strength` is called).
121122
`hour_incr`: Increment of the timestamp of the outfile relative to the infile in hours (defaults to 1; negative values also supported). This parameter should be adapted if the calibration is done for a subsequent run more than one hour ahead.
123+
`max_miss_stns`: Maximum number of stations with more than 50% missing observations. If
124+
there are less than 50% missing observation per station, they are replaced by the mean
125+
of the remaining observations; if there are more than 50% missing observations per station,
126+
the station is removed from pollen calibration. If there are more stations removed than mas_miss_stns, the pollen calibration is stopped and an alert is issued.
122127

123128

124129
### How to run the package

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "realtime-pollen-calibration"
10-
version = "1.1.0"
10+
version = "1.2.0"
1111
description = "Simon Adamov's Realtime Pollen Calibration, ICON extension by Andreas Pauling"
1212
readme = "README.md"
1313
keywords = [

src/realtime_pollen_calibration/set_up.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ def set_up_config(config_file: str):
3333

3434
config.pov_outfile = data["pov_outfile"]
3535

36+
config.max_miss_stns = data["max_miss_stns"]
37+
3638
config.hour_incr = data["hour_incr"]
3739

3840
return config

src/realtime_pollen_calibration/update_phenology.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,10 @@ def update_phenology_realtime(config_obj: utils.Config, verbose: bool = True):
133133
dict_fields = {}
134134
for pollen_type in utils.get_pollen_type(ds):
135135
obs_mod_data = utils.read_atab(
136-
pollen_type, config_obj.station_obs_file, verbose=verbose
136+
pollen_type,
137+
config_obj.max_miss_stns,
138+
config_obj.station_obs_file,
139+
verbose=verbose,
137140
)
138141
change_phenology_fields = utils.get_change_phenol(
139142
pollen_type, obs_mod_data, ds, verbose

src/realtime_pollen_calibration/update_strength.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ def update_strength_realtime(config_obj: utils.Config, verbose: bool = True):
102102
for pollen_type in ptype_present:
103103
obs_mod_data = utils.read_atab(
104104
pollen_type,
105+
config_obj.max_miss_stns,
105106
config_obj.station_obs_file,
106107
config_obj.station_mod_file,
107108
verbose=verbose,

src/realtime_pollen_calibration/utils.py

Lines changed: 87 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020

2121
@dataclass
22-
class Config:
22+
class Config: # pylint: disable=too-many-instance-attributes
2323

2424
pov_infile: str = ""
2525
"""ICON GRIB2 file including path containing the pollen fields:
@@ -31,7 +31,7 @@ class Config:
3131
"""ICON GRIB2 file including path of the desired output file."""
3232

3333
t2m_file: str = ""
34-
""""ICON GRIB2 file including path containing T_2M."""
34+
"""ICON GRIB2 file including path containing T_2M."""
3535

3636
const_file: str = ""
3737
"""ICON GRIB2 file including path containing Longitudes (clon)
@@ -48,7 +48,14 @@ class Config:
4848
pollen concentrations at the stations.
4949
"""
5050

51+
max_miss_stns: int = 4
52+
"""Maximum number of stations with more than 50% missing data
53+
allowed. If more than this number of stations are missing,
54+
the pollen calibration is not performed.
55+
"""
56+
5157
hour_incr: int = 1
58+
"""Hour increment (grib coding) between input and output POV file."""
5259

5360

5461
ObsModData = namedtuple(
@@ -125,7 +132,11 @@ def read_clon_clat(const_file):
125132

126133

127134
def read_atab(
128-
pollen_type: str, file_obs_stns: str, file_mod_stns: str = "", verbose: bool = True
135+
pollen_type: str,
136+
max_miss_stns: int,
137+
file_obs_stns: str,
138+
file_mod_stns: str = "",
139+
verbose: bool = True,
129140
) -> ObsModData:
130141
# pylint: disable=too-many-locals
131142
"""Read the pollen concentrations and the station locations from the ATAB files.
@@ -134,6 +145,7 @@ def read_atab(
134145
pollen_type: String describing the pollen type analysed.
135146
file_obs_stns: Location of the observation ATAB file.
136147
file_mod_stns: Location of the model ATAB file. (Optional)
148+
max_miss_stns: Max. number of stations with more than 50% missing data
137149
verbose: Optional additional debug prints.
138150
139151
Returns:
@@ -188,13 +200,13 @@ def read_obs_header(file_data: str):
188200
return HeaderData(coord_stns, missing_value, stn_indicators, n_header)
189201

190202
headerdata = read_obs_header(file_obs_stns)
191-
data = pd.read_csv(
203+
data_obs = pd.read_csv(
192204
file_obs_stns,
193205
header=headerdata.n_header,
194206
sep=r"\s+",
195207
parse_dates=[[1, 2, 3, 4, 5]],
196208
)
197-
data = data[data["PARAMETER"] == pollen_type].iloc[:, 2:].to_numpy()
209+
data_obs = data_obs[data_obs["PARAMETER"] == pollen_type].iloc[:, 2:].to_numpy()
198210
if file_mod_stns != "":
199211
stn_indicators_mod = None
200212
missing_value = None
@@ -207,7 +219,6 @@ def read_obs_header(file_data: str):
207219
if line.strip()[0:9] == "PARAMETER":
208220
n_header_mod = n
209221
break
210-
istation_mod = get_mod_stn_index(headerdata.stn_indicators, stn_indicators_mod)
211222
data_mod = pd.read_csv(
212223
file_mod_stns,
213224
header=n_header_mod,
@@ -228,11 +239,24 @@ def read_obs_header(file_data: str):
228239
else:
229240
data_mod = 0
230241
istation_mod = 0
231-
data = treat_missing(
232-
data, headerdata.missing_value, headerdata.stn_indicators, verbose=verbose
242+
data_obs, headerdata = treat_missing(
243+
data_obs,
244+
headerdata,
245+
max_miss_stns,
246+
headerdata.stn_indicators,
247+
headerdata.missing_value,
248+
verbose=verbose,
233249
)
250+
# Calculating the station correspondence indices of obs/mod data.
251+
if file_mod_stns != "":
252+
istation_mod = get_mod_stn_index(headerdata.stn_indicators, stn_indicators_mod)
253+
234254
return ObsModData(
235-
data, headerdata.coord_stns, headerdata.missing_value, data_mod, istation_mod
255+
data_obs,
256+
headerdata.coord_stns,
257+
headerdata.missing_value,
258+
data_mod,
259+
istation_mod,
236260
)
237261

238262

@@ -252,27 +276,36 @@ def create_data_arrays(cal_fields, clon, clat, time_values):
252276
return cal_fields_arrays
253277

254278

255-
def treat_missing(
279+
def treat_missing( # pylint: disable=too-many-positional-arguments,too-many-arguments
256280
array,
281+
headerdata,
282+
max_miss_stns,
283+
stn_indicators: np.ndarray,
257284
missing_value: float = -9999.0,
258-
stn_indicators: str = "",
259285
verbose: bool = True,
260286
):
261287
"""Treat the missing values of the input array.
262288
263289
Args:
264290
array: Array containing the concentration values.
291+
headerdata: HeaderData containing ATAB header info.
265292
missing_value: Value considered as a missing value.
293+
stn_indicators: Array of station indicators (PBS, PBU, ...).
294+
max_miss_stns: Max. number of stations with more than 50% missing data
266295
verbose: Optional additional debug prints.
267296
268297
Returns:
269-
array: Modified array with removed missing values.
298+
array: Modified array with replaced missing values or removed stations.
299+
headerdata: Updated metadata on the stations matching the changes in array.
270300
271301
"""
272302
array_missing = array == missing_value
273303
nstns = array.shape[1]
274304
skip_missing_stn = np.zeros(nstns)
275-
for istation in range(nstns):
305+
306+
stns_missing = 0
307+
istation = 0
308+
while istation < nstns:
276309
skip_missing_stn[istation] = np.count_nonzero(array_missing[:, istation])
277310
if verbose:
278311
print(
@@ -295,14 +328,48 @@ def treat_missing(
295328
array[idx2, istation] = np.mean(array[idx1, istation])
296329
else:
297330
print(
298-
f"Station {stn_indicators[istation]} has more than 50% missing ",
299-
"data. Please check the reason (Does jretrievedwh still work?).\n",
300-
"No pollen calibration update is performed until this is fixed! ",
301-
"Pollen in ICON will still work, but calibration fields get ",
302-
"more and more outdated.",
331+
f"Station {stn_indicators[istation]} has more than 50% missing\n",
332+
"data and is REMOVED from this pollen calibration run.\n",
333+
"Please check the reason (Does jretrievedwh still work?).\n",
334+
"If this occurs only at few stations the impact on the pollen\n",
335+
"calibration is minimal. However, if more stations are affected,\n",
336+
"switching off the pollen calibration should be considered,\n",
337+
"(depending on which stations, species and time of the year).",
303338
)
304-
sys.exit(1)
305-
return array
339+
stns_missing += 1
340+
if stns_missing > max_miss_stns:
341+
print(
342+
f"\nALERT: More than {max_miss_stns} stations have more than\n",
343+
"50% missing data, no pollen calibration is performed!\n",
344+
"Pollen are still running but fix this asap by checking\n",
345+
"the reason for the missing observations.\n",
346+
)
347+
sys.exit(1)
348+
349+
# Remove the station from array, coord_stns, and stn_indicators
350+
array = np.delete(array, istation, axis=1)
351+
352+
updated_coord_stns = (
353+
headerdata.coord_stns[:istation]
354+
+ headerdata.coord_stns[istation + 1 :]
355+
)
356+
headerdata = HeaderData(
357+
coord_stns=updated_coord_stns,
358+
missing_value=headerdata.missing_value,
359+
stn_indicators=np.delete(headerdata.stn_indicators, istation),
360+
n_header=headerdata.n_header,
361+
)
362+
stn_indicators = np.delete(stn_indicators, istation)
363+
array_missing = np.delete(array_missing, istation, axis=1)
364+
365+
# Update nstns since a station was removed
366+
nstns -= 1
367+
# Do not increment istation; check the next station at the same index
368+
continue
369+
# Increment istation to move to the next station
370+
istation += 1
371+
372+
return array, headerdata
306373

307374

308375
def get_field_at(ds, field: str, coords: tuple):

tests/test_realtime_pollen_calibration/test_utils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,10 @@ def test_interpolation():
3838
encode_cf=("time", "geography", "vertical"),
3939
)
4040
obs_mod_data = utils.read_atab(
41-
"ALNU",
42-
str(here()) + "/RTcal_testdata/alnu_pollen_measured_values_2022022207.atab",
41+
pollen_type="ALNU",
42+
max_miss_stns=2,
43+
file_obs_stns=str(here())
44+
+ "/RTcal_testdata/alnu_pollen_measured_values_2022022207.atab",
4345
)
4446
######################
4547
# Specify the test

tools/get_data_and_test.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ const_file : $localpath/RTcal_testdata/CLON_CLAT_ICON-CH1.gb2
1919
station_obs_file : $localpath/RTcal_testdata/pollen_measured_2024020118.atab
2020
station_mod_file : $localpath/RTcal_testdata/pollen_modelled_2024020118.atab
2121
hour_incr : 1
22+
max_miss_stns : 4
2223
EOF
2324

2425
# activate conda env

0 commit comments

Comments
 (0)