Skip to content

Commit ea0e431

Browse files
committed
Merge remote-tracking branch 'origin/main' into reading_grib_files
2 parents dde8b24 + 798733b commit ea0e431

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+1854
-1181
lines changed

.github/workflows/pre-commit.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,9 @@ jobs:
2626
channels: conda-forge
2727
channel-priority: flexible
2828
show-channel-urls: true
29-
- name: Create dev env from unpinned reqs
29+
- name: Create env from unpinned reqs
3030
run: |
3131
conda env create --name dev_env --file requirements/requirements.yml
32-
conda env update --name dev_env --file requirements/dev-requirements.yml
3332
- name: Install pre-commit hooks
3433
run: |
3534
conda run --name dev_env pre-commit install-hooks

.github/workflows/pytest.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ jobs:
2929
- name: Create dev env from unpinned reqs
3030
run: |
3131
conda env create --name dev_env --file requirements/requirements.yml
32-
conda env update --name dev_env --file requirements/dev-requirements.yml
3332
- name: Run Pytest
3433
env:
3534
TZ: Europe/Zurich

.pre-commit-config.yaml

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -111,16 +111,17 @@ repos:
111111
entry: pydocstyle
112112
types: [python]
113113
files: ^src/
114-
# - repo: local
115-
# hooks:
116-
# - id: pylint
117-
# name: pylint
118-
# description: Check Python code for correctness, consistency and adherence to best practices
119-
# language: system
120-
# entry: pylint
121-
# types: [python]
122-
# args:
123-
# - "--max-line-length=88"
114+
- repo: local
115+
hooks:
116+
- id: pylint
117+
name: pylint
118+
description: Check Python code for correctness, consistency and adherence to best practices
119+
language: system
120+
entry: pylint
121+
types: [python]
122+
args:
123+
- "--max-line-length=88"
124+
- "--disable=C0116,R0912,R0913,R0914,R0915,R1710,W0511,W0719"
124125
- repo: local
125126
hooks:
126127
- id: flake8
@@ -132,4 +133,4 @@ repos:
132133
args:
133134
- "--max-line-length=88"
134135
- "--ignore=E203,W503,F811,I002"
135-
# - "--max-complexity=10"
136+
- "--max-complexity=12"

.vscode/settings.json

Lines changed: 0 additions & 21 deletions
This file was deleted.

README.md

Lines changed: 109 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -84,74 +84,140 @@ Even though probtest is used exclusively with ICON at the moment, it does not co
8484

8585
This command sets up the configuration file. For more help on the command line arguments for `init`, see
8686

87-
```
87+
```console
8888
python probtest.py init --help
8989
```
9090

9191
The `--template-name` argument can be used to specify the template from which the configuration file is created. One of the templates provided by probtest is `templates/ICON.jinja` which is used as the default in case no other template name is provided. The init command replaces all placeholder values in the template with the values given as command line arguments. All other probtest commands can then read from the configuration file. The name of the configuration file to use is read from the `PROBTEST_CONFIG` environment variable. If this is not set explicitly, probtest will look for a file called `probtest.json` in the current directory.
9292

9393
Setting up the configuration file with `init` may not be fitted perfectly to where you want your probtest files to be. In that case, you can manually edit the file after creation. Alternatively, you can add arguments for your probtest commands on the command line which take precedence over the configuration file defaults. For more help on the options on a specific command, see
9494

95-
```
95+
```console
9696
python probtest.py {command} --help
9797
```
9898

99-
### Example: Check the output of an experiment
99+
### Example: Check the output of an ICON experiment with an test build compared to a reference build
100100

101-
Objective: Run the mch_opr_r04b07 ICON experiment and check if the output of the run is ok. Probtest requires some additional python packages. On Piz Daint, there is a pre-installed python environment which can be loaded with:
101+
Objective: Run an `exp_name` ICON experiment with an test build and check if the
102+
output of the test is within a perturbed ensemble of the reference build.
103+
This is in particular used to validate a GPU build against a CPU reference.
102104

103-
```
104-
source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest
105-
```
105+
All requirements for using probtest can be easily installed with conda using the
106+
setup scripts:
106107

107-
Alternatively, all requirements can be easily installed with conda:
108-
109-
```
108+
```console
110109
./setup_miniconda.sh
111-
./setup_env.sh -n probtest -d -u
110+
./setup_env.sh -n probtest -u
112111
```
113112

114-
Once set up, probtest can generate the config file according to your needs:
113+
#### Initialize probtest
114+
Once set up, probtest can generate the config file according to your needs.
115+
Initialized a `probtest.json` file in your reference build directory, `exp_name`
116+
here should refer to your experiment script:
115117

118+
```console
119+
cd icon-base-dir/reference-build
120+
python ../externals/probtest/probtest.py init --codebase-install $PWD --experiment-name exp_name --reference $PWD --file-id NetCDF "*atm_3d_ml*.nc" --file-id NetCDF "*atm_3d_il*.nc" --file-id NetCDF "*atm_3d_hl*.nc" --file-id NetCDF "*atm_3d_pl*.nc" --file-id latlon "*atm_2d_ll*.nc" --file-id meteogram "Meteogram*.nc"
116121
```
117-
python probtest.py init --codebase-install /path/to/the/ICON/Installation/ --experiment-name mch_opr_r04b07 --reference /path/to/icon-test-references/daint_cpu_pgi/ --file-id NetCDF "*atm_3d_ml*" --file-id NetCDF "*atm_3d_hl*"
122+
You might need to update the used account in the json file.
123+
The perturbation amplitude may also need to be changed in the json file
124+
(buildbot uses 1e-07 for mixed precision and 1e-14 for double precision).
125+
Note that to change this you should modify the second entry of `rhs_new` in
126+
probtest.json, which should be set to 1e-14 by default.
127+
128+
Note that, it is important that the `file-id` are uniquely describing the data
129+
with the same structure.
130+
Otherwise you might get an error like
131+
```console
132+
packages/pandas/core/indexes/base.py", line 4171, in _validate_can_reindex
133+
raise ValueError("cannot reindex on an axis with duplicate labels")
134+
ValueError: cannot reindex on an axis with duplicate labels
118135
```
136+
For examples of proper `file-id`s have a look in the ICON repo at
137+
`run/tolerance/set_probtest_file_id`.
138+
139+
Now you should have created a `probtest.json` file in the reference build directory.
140+
This file contains all information needed by probtest to process the ICON experiment.
141+
142+
#### Generate references and tolerances for the reference build
143+
With everything set up properly, the chain of commands can be invoked to run the
144+
reference binary (`run-ensemble`), generate the statistics files used for
145+
probtest comparisons (`stats`) and generate tolerances from these files
146+
(`tolerance`).
147+
To run the perturbed experiments and wait for the submitted jobs to finish:
148+
```console
149+
python ../externals/probtest/probtest.py run-ensemble
150+
```
151+
FYI: if the experiment does not generate all of the files listed in the
152+
`file-id`s above, you you receive a message that certain `file-id` patterns do
153+
not match any file.
154+
Those files can remove them from `file-id`s.
119155

120-
This will create a `probtest.json` file in the current directory. This file contains all information needed by probtest to process the ICON experiment.
121-
122-
With everything set up properly, the chain of commands can be invoked to run the CPU reference binary (`run-ensemble`), generate the statistics files used for probtest comparisons (`stats`) and generate tolerances from these files (`tolerance`).
123-
156+
Extract the statistics of your perturbed runs:
157+
```console
158+
python ../externals/probtest/probtest.py stats --ensemble
124159
```
125-
python probtest.py run-ensemble
126-
python probtest.py stats --ensemble
127-
python probtest.py tolerance
160+
Note that the `--ensemble` option which is set to take precedence over the
161+
default `False` from the configuration and make probtest process the model
162+
output from each ensemble generated by `run-ensemble`.
163+
164+
Finally create the tolerance.csv file for the `exp_name` by analysing those
165+
statistics:
166+
```console
167+
python ../externals/probtest/probtest.py tolerance
128168
```
129169

130-
Note the `--ensemble` option which is set to take precedence over the default `False` from the configuration and make probtest process the model output from each ensemble generated by `run-ensemble`. These commands will generate a number of files:
170+
These commands will generate a number of files:
131171

132172
- `stats_ref.csv`: contains the post-processed output from the unperturbed reference run
133173
- `stats_{member_num}.csv`: contain the post-processed output from the perturbed reference runs (only needed temporarily to generate the tolerance file)
134-
- `mch_opr_r04b07_tolerance.csv`: contains tolerance ranges computed from the stats-files
174+
- `exp_name_tolerance.csv`: contains tolerance ranges computed from the stats-files
135175

136-
These can then be used to compare against the output of a test binary (usually a GPU binary). For that, manually run the `exp.mch_opr_r04b07.run` experiment with the test binary to produce the test output. Then use probtest to generate the stats file for this output:
176+
These can then be used to compare against the output of a test binary (usually a
177+
GPU binary).
178+
For that, manually run the `exp_name.run` experiment with the test binary to
179+
produce the test output.
137180

138-
```
139-
python probtest.py stats --model-output-dir /path/to/test-icon/experiments/mch_opr_r04b07 --stats-file-name stats_cur.csv
181+
#### Run and check with test build
182+
183+
To then check if your data from the test binary are validating against reference
184+
build, first run the experiments with the test build.
185+
Run your test simulation without probtest:
186+
```console
187+
cd icon-base-dir/test-build
188+
sbatch run/exp_name.run
140189
```
141190

142-
Note how `--model-output-dir` is set to take precedence over the default which points to the reference binary output to now point to the test binary output as well as the name of the generated file with `--stats-file-name` to avoid name clash with the stats file from the reference. This command will generate the following file:
191+
Then create the test statistics with:
192+
```console
193+
python ../externals/probtest/probtest.py stats --no-ensemble --model-output-dir icon-base-dir/test-build/experiments/exp_name
194+
```
195+
Note how `--model-output-dir` is set to take precedence over the default which
196+
points to the reference binary output to now point to the test binary output.
197+
This command will generate the following file:
143198

144-
- `stats_cur.csv`: contains the post-processed output of the test binary model output.
199+
- `stats_exp_name.csv`: contains the post-processed output of the test binary model output.
145200

146-
Now all files needed to perform a probtest check are available; the reference file `stats_ref.csv`, the test file `stats_cur.csv` as well as the tolerance range `mch_opr_r04b07_tolerance.csv`. Providing these files to `check` will perform the check:
201+
Now all files needed to perform a probtest check are available; the reference
202+
file `stats_ref.csv`, the test file `stats_exp_name.csv` as well as the tolerance
203+
range `exp_name_tolerance.csv`.
204+
Providing these files to `check` will perform the check:
147205

206+
```console
207+
python ../externals/probtest/probtest.py check --input-file-ref stats_ref.csv --input-file-cur stats_exp_name.csv --factor 5
148208
```
149-
python probtest.py check --input-file-ref stats_ref.csv --input-file-cur stats_cur.csv
209+
210+
This check can be also visualized by:
211+
```console
212+
python ../externals/probtest/probtest.py check-plot --input-file-ref stats_ref.csv --input-file-cur stats_exp_name.csv --tolerance-file-name exp_name_tolerance.csv --factor 5 --savedir ./plot_dir
150213
```
151214

152-
Note that the reference `--input-file-ref` and test stats files `--input-file-cur` need to be set by command line arguments. This is because the default stored in the `ICON.jinja` template is pointing to two files from the ensemble as a sanity check.
215+
Note that the reference `--input-file-ref` and test stats files
216+
`--input-file-cur` need to be set by command line arguments.
217+
This is because the default stored in the `ICON.jinja` template is pointing to
218+
two files from the ensemble as a sanity check.
153219

154-
## Developing in probtest
220+
## Developing probtest
155221
#### Testing with [pytest](https://docs.pytest.org/en/8.2.x/)
156222

157223
Our tests are executed using `pytest`, ensuring a consistent and efficient testing process. Each test dynamically generates its necessary test data, allowing for flexible and isolated testing scenarios.
@@ -169,18 +235,22 @@ pytest -s -v path/to/your/test.py
169235

170236
Reference data, crucial for validating the outcomes of our tests and detecting any deviations in `probtests` results, is maintained in the [tests/data](tests/data) directory. This approach guarantees that our tests are both comprehensive and reliable, safeguarding the integrity of our codebase.
171237

172-
### Code formatting
238+
### Formatting probtest source code
173239

174-
Code is formatted using black and isort. Please install the pre-commit hooks (after installing all Python requirements including the `pre-commit` package):
240+
The probtest source code is formatted using multiple formatters.
241+
Please install the pre-commit hooks (after installing all Python requirements
242+
including the `pre-commit` package):
175243

176-
```
244+
```console
177245
pre-commit install
178246
```
179247

180-
This hook will be executed automatically whenever you commit. It will check your files and format them according to its rules. If files have to be formatted, committing will fail. Just commit again to finalize the commit. You can also run the following command, to trigger the pre-commit action without actually committing:
181-
182-
```
248+
This hook will be executed automatically whenever you commit.
249+
It will check your files and format them according to its rules.
250+
If files have to be formatted, committing will fail.
251+
Just stage and commit again to finalize the commit.
252+
You can also run the following command, to trigger the pre-commit action without
253+
actually committing:
254+
```console
183255
pre-commit run --all-files
184256
```
185-
186-
If you are using VSCode with the settings provided by this repository in `.vscode/settings.json` formatting is already enabled on save.

engine/cdo_table.py

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
"""
2+
CLI for CDO Table Generation
3+
4+
This module computes and generates a CDO table by comparing model output data
5+
against perturbed model data.
6+
"""
7+
18
import tempfile
29
from pathlib import Path
310

@@ -14,7 +21,7 @@
1421
from util.log_handler import logger
1522

1623

17-
def rel_diff(var1, var2):
24+
def compute_rel_diff(var1, var2):
1825
rel_diff = np.zeros_like(var1)
1926

2027
mask_0 = np.logical_and(np.abs(var1) < 1e-15, np.abs(var2) < 1e-15)
@@ -31,8 +38,14 @@ def rel_diff(var1, var2):
3138

3239

3340
def rel_diff_stats(
34-
file_id, filename, varname, time_dim, horizontal_dims, xarray_ds, fill_value_key
35-
):
41+
file_id,
42+
filename,
43+
varname,
44+
time_dim,
45+
horizontal_dims,
46+
xarray_ds,
47+
fill_value_key,
48+
): # pylint: disable=unused-argument
3649
dims = xarray_ds[varname].dims
3750
dataarray = xarray_ds[varname]
3851
time = xarray_ds[time_dim].values
@@ -52,7 +65,7 @@ def rel_diff_stats(
5265
mask = data != dataarray.attrs[fill_value_key]
5366
data = data[mask]
5467

55-
hist, edges = np.histogram(data, bins=[0] + cdo_bins)
68+
hist, _ = np.histogram(data, bins=[0] + cdo_bins)
5669
matrix[i * ncol] = amax[i]
5770
matrix[i * ncol + 1 : i * ncol + ncol] = hist
5871

@@ -116,7 +129,7 @@ def cdo_table(
116129
# TODO: A single perturbed run provides enough data to make proper statistics.
117130
# refactor cdo_table interface to reflect that
118131
if len(member_num) == 1:
119-
member_num = [i for i in range(1, member_num[0] + 1)]
132+
member_num = list(range(1, member_num[0] + 1))
120133
if member_type:
121134
member_id = member_type + "_" + str(member_num[0])
122135
else:
@@ -132,13 +145,11 @@ def cdo_table(
132145

133146
# step 1: compute rel-diff netcdf files
134147
with tempfile.TemporaryDirectory() as tmpdir:
135-
for file_type, file_pattern in file_id:
148+
for _, file_pattern in file_id:
136149
ref_files, err = file_names_from_pattern(model_output_dir, file_pattern)
137150
if err > 0:
138151
logger.info(
139-
"did not find any files for pattern {}. Continue.".format(
140-
file_pattern
141-
)
152+
"did not find any files for pattern %s. Continue.", file_pattern
142153
)
143154
continue
144155
ref_files.sort()
@@ -147,21 +158,17 @@ def cdo_table(
147158
)
148159
if err > 0:
149160
logger.info(
150-
"did not find any files for pattern {}. Continue.".format(
151-
file_pattern
152-
)
161+
"did not find any files for pattern %s. Continue.", file_pattern
153162
)
154163
continue
155164
perturb_files.sort()
156165

157166
for rf, pf in zip(ref_files, perturb_files):
158167
if not rf.endswith(".nc") or not pf.endswith(".nc"):
159168
continue
160-
ref_data = xr.open_dataset("{}/{}".format(model_output_dir, rf))
169+
ref_data = xr.open_dataset(f"{model_output_dir}/{rf}")
161170
perturb_data = xr.open_dataset(
162-
"{}/{}".format(
163-
perturbed_model_output_dir.format(member_id=member_id), pf
164-
)
171+
f"{perturbed_model_output_dir.format(member_id=member_id)}/{pf}"
165172
)
166173
diff_data = ref_data.copy()
167174
varnames = [
@@ -171,12 +178,12 @@ def cdo_table(
171178
]
172179

173180
for v in varnames:
174-
diff_data.variables.get(v).values = rel_diff(
181+
diff_data.variables.get(v).values = compute_rel_diff(
175182
ref_data.variables.get(v).values,
176183
perturb_data.variables.get(v).values,
177184
)
178185

179-
diff_data.to_netcdf("{}/{}".format(tmpdir, rf))
186+
diff_data.to_netcdf(f"{tmpdir}/{rf}")
180187
ref_data.close()
181188
perturb_data.close()
182189
diff_data.close()
@@ -191,7 +198,7 @@ def cdo_table(
191198
df.loc[:, (t, cdo_bins)].sum(axis=1), axis=0
192199
)
193200

194-
logger.info("writing cdo table to {}.".format(cdo_table_file))
201+
logger.info("writing cdo table to %s.", cdo_table_file)
195202

196203
Path(cdo_table_file).parent.mkdir(parents=True, exist_ok=True)
197204
df.to_csv(cdo_table_file)

0 commit comments

Comments
 (0)