MeteoSwiss
diff --git a/‎.github/workflows/pre-commit.yml‎
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/pre-commit.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎.github/workflows/pytest.yml‎
Lines changed: 0 additions & 1 deletion b/‎.github/workflows/pytest.yml‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 12 additions & 11 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 12 additions & 11 deletions
diff --git a/‎.vscode/settings.json‎
Lines changed: 0 additions & 21 deletions b/‎.vscode/settings.json‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎README.md‎
Lines changed: 109 additions & 39 deletions b/‎README.md‎
Lines changed: 109 additions & 39 deletions
diff --git a/‎engine/cdo_table.py‎
Lines changed: 26 additions & 19 deletions b/‎engine/cdo_table.py‎
Lines changed: 26 additions & 19 deletions
@@ -26,10 +26,9 @@ jobs:
         channels: conda-forge
         channel-priority: flexible
         show-channel-urls: true
-    - name: Create dev env from unpinned reqs
+    - name: Create env from unpinned reqs
       run: |
         conda env create --name dev_env --file requirements/requirements.yml
-        conda env update --name dev_env --file requirements/dev-requirements.yml
     - name: Install pre-commit hooks
       run: |
         conda run --name dev_env pre-commit install-hooks
 
@@ -29,7 +29,6 @@ jobs:
     - name: Create dev env from unpinned reqs
       run: |
         conda env create --name dev_env --file requirements/requirements.yml
-        conda env update --name dev_env --file requirements/dev-requirements.yml
     - name: Run Pytest
       env:
         TZ: Europe/Zurich
 
@@ -111,16 +111,17 @@ repos:
           entry: pydocstyle
           types: [python]
           files: ^src/
-  # -   repo: local
-  #     hooks:
-  #     -   id: pylint
-  #         name: pylint
-  #         description: Check Python code for correctness, consistency and adherence to best practices
-  #         language: system
-  #         entry: pylint
-  #         types: [python]
-  #         args:
-  #         - "--max-line-length=88"
+  -   repo: local
+      hooks:
+      -   id: pylint
+          name: pylint
+          description: Check Python code for correctness, consistency and adherence to best practices
+          language: system
+          entry: pylint
+          types: [python]
+          args:
+          - "--max-line-length=88"
+          - "--disable=C0116,R0912,R0913,R0914,R0915,R1710,W0511,W0719"
   -   repo: local
       hooks:
       -   id: flake8
@@ -132,4 +133,4 @@ repos:
           args:
           - "--max-line-length=88"
           - "--ignore=E203,W503,F811,I002"
-          # - "--max-complexity=10"
+          - "--max-complexity=12"
@@ -84,74 +84,140 @@ Even though probtest is used exclusively with ICON at the moment, it does not co
 
 This command sets up the configuration file. For more help on the command line arguments for `init`, see
 
-```
+```console
 python probtest.py init --help
 ```
 
 The `--template-name` argument can be used to specify the template from which the configuration file is created. One of the templates provided by probtest is `templates/ICON.jinja` which is used as the default in case no other template name is provided. The init command replaces all placeholder values in the template with the values given as command line arguments. All other probtest commands can then read from the configuration file. The name of the configuration file to use is read from the `PROBTEST_CONFIG` environment variable. If this is not set explicitly, probtest will look for a file called `probtest.json` in the current directory.
 
 Setting up the configuration file with `init` may not be fitted perfectly to where you want your probtest files to be. In that case, you can manually edit the file after creation. Alternatively, you can add arguments for your probtest commands on the command line which take precedence over the configuration file defaults. For more help on the options on a specific command, see
 
-```
+```console
 python probtest.py {command} --help
 ```
 
-### Example: Check the output of an experiment
+### Example: Check the output of an ICON experiment with an test build compared to a reference build
 
-Objective: Run the mch_opr_r04b07 ICON experiment and check if the output of the run is ok. Probtest requires some additional python packages. On Piz Daint, there is a pre-installed python environment which can be loaded with:
+Objective: Run an `exp_name` ICON experiment with an test build and check if the
+output of the test is within a perturbed ensemble of the reference build.
+This is in particular used to validate a GPU build against a CPU reference.
 
-```
-source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest
-```
+All requirements for using probtest can be easily installed with conda using the
+setup scripts:
 
-Alternatively, all requirements can be easily installed with conda:
-
-```
+```console
 ./setup_miniconda.sh
-./setup_env.sh -n probtest -d -u
+./setup_env.sh -n probtest -u
 ```
 
-Once set up, probtest can generate the config file according to your needs:
+#### Initialize probtest
+Once set up, probtest can generate the config file according to your needs.
+Initialized a `probtest.json` file in your reference build directory, `exp_name`
+here should refer to your experiment script:
 
+```console
+cd icon-base-dir/reference-build
+python ../externals/probtest/probtest.py init --codebase-install $PWD --experiment-name exp_name --reference $PWD --file-id NetCDF "*atm_3d_ml*.nc" --file-id NetCDF "*atm_3d_il*.nc" --file-id NetCDF "*atm_3d_hl*.nc" --file-id NetCDF "*atm_3d_pl*.nc" --file-id latlon "*atm_2d_ll*.nc" --file-id meteogram "Meteogram*.nc"
 ```
-python probtest.py init --codebase-install /path/to/the/ICON/Installation/ --experiment-name mch_opr_r04b07 --reference /path/to/icon-test-references/daint_cpu_pgi/ --file-id NetCDF "*atm_3d_ml*" --file-id NetCDF "*atm_3d_hl*"
+You might need to update the used account in the json file.
+The perturbation amplitude may also need to be changed in the json file
+(buildbot uses 1e-07 for mixed precision and 1e-14 for double precision).
+Note that to change this you should modify the second entry of `rhs_new` in
+probtest.json, which should be set to 1e-14 by default.
+
+Note that, it is important that the `file-id` are uniquely describing the data
+with the same structure.
+Otherwise you might get an error like
+```console
+packages/pandas/core/indexes/base.py", line 4171, in _validate_can_reindex
+    raise ValueError("cannot reindex on an axis with duplicate labels")
+ValueError: cannot reindex on an axis with duplicate labels
 ```
+For examples of proper `file-id`s have a look in the ICON repo at
+`run/tolerance/set_probtest_file_id`.
+
+Now you should have created a `probtest.json` file in the reference build directory.
+This file contains all information needed by probtest to process the ICON experiment.
+
+#### Generate references and tolerances for the reference build
+With everything set up properly, the chain of commands can be invoked to run the
+reference binary (`run-ensemble`), generate the statistics files used for
+probtest comparisons (`stats`) and generate tolerances from these files
+(`tolerance`).
+To run the perturbed experiments and wait for the submitted jobs to finish:
+```console
+python ../externals/probtest/probtest.py run-ensemble
+```
+FYI: if the experiment does not generate all of the files listed in the
+`file-id`s above, you you receive a message that certain `file-id` patterns do
+not match any file.
+Those files can remove them from `file-id`s.
 
-This will create a `probtest.json` file in the current directory. This file contains all information needed by probtest to process the ICON experiment.
-
-With everything set up properly, the chain of commands can be invoked to run the CPU reference binary (`run-ensemble`), generate the statistics files used for probtest comparisons (`stats`) and generate tolerances from these files (`tolerance`).
-
+Extract the statistics of your perturbed runs:
+```console
+python ../externals/probtest/probtest.py stats --ensemble
 ```
-python probtest.py run-ensemble
-python probtest.py stats --ensemble
-python probtest.py tolerance
+Note that the `--ensemble` option which is set to take precedence over the
+default `False` from the configuration and make probtest process the model
+output from each ensemble generated by `run-ensemble`.
+
+Finally create the tolerance.csv file for the `exp_name` by analysing those
+statistics:
+```console
+python ../externals/probtest/probtest.py tolerance
 ```
 
-Note the `--ensemble` option which is set to take precedence over the default `False` from the configuration and make probtest process the model output from each ensemble generated by `run-ensemble`. These commands will generate a number of files:
+These commands will generate a number of files:
 
 - `stats_ref.csv`: contains the post-processed output from the unperturbed reference run
 - `stats_{member_num}.csv`: contain the post-processed output from the perturbed reference runs (only needed temporarily to generate the tolerance file)
-- `mch_opr_r04b07_tolerance.csv`: contains tolerance ranges computed from the stats-files
+- `exp_name_tolerance.csv`: contains tolerance ranges computed from the stats-files
 
-These can then be used to compare against the output of a test binary (usually a GPU binary). For that, manually run the `exp.mch_opr_r04b07.run` experiment with the test binary to produce the test output. Then use probtest to generate the stats file for this output:
+These can then be used to compare against the output of a test binary (usually a
+GPU binary).
+For that, manually run the `exp_name.run` experiment with the test binary to
+produce the test output.
 
-```
-python probtest.py stats --model-output-dir /path/to/test-icon/experiments/mch_opr_r04b07 --stats-file-name stats_cur.csv
+#### Run and check with test build
+
+To then check if your data from the test binary are validating against reference
+build, first run the experiments with the test build.
+Run your test simulation without probtest:
+```console
+cd icon-base-dir/test-build
+sbatch run/exp_name.run
 ```
 
-Note how `--model-output-dir` is set to take precedence over the default which points to the reference binary output to now point to the test binary output as well as the name of the generated file with `--stats-file-name` to avoid name clash with the stats file from the reference. This command will generate the following file:
+Then create the test statistics with:
+```console
+python ../externals/probtest/probtest.py stats --no-ensemble --model-output-dir icon-base-dir/test-build/experiments/exp_name
+```
+Note how `--model-output-dir` is set to take precedence over the default which
+points to the reference binary output to now point to the test binary output.
+This command will generate the following file:
 
-- `stats_cur.csv`: contains the post-processed output of the test binary model output.
+- `stats_exp_name.csv`: contains the post-processed output of the test binary model output.
 
-Now all files needed to perform a probtest check are available; the reference file `stats_ref.csv`, the test file `stats_cur.csv` as well as the tolerance range `mch_opr_r04b07_tolerance.csv`. Providing these files to `check` will perform the check:
+Now all files needed to perform a probtest check are available; the reference
+file `stats_ref.csv`, the test file `stats_exp_name.csv` as well as the tolerance
+range `exp_name_tolerance.csv`.
+Providing these files to `check` will perform the check:
 
+```console
+python ../externals/probtest/probtest.py check --input-file-ref stats_ref.csv --input-file-cur stats_exp_name.csv --factor 5
 ```
-python probtest.py check --input-file-ref stats_ref.csv --input-file-cur stats_cur.csv
+
+This check can be also visualized by:
+```console
+python ../externals/probtest/probtest.py check-plot --input-file-ref stats_ref.csv --input-file-cur stats_exp_name.csv --tolerance-file-name exp_name_tolerance.csv --factor 5 --savedir ./plot_dir
 ```
 
-Note that the reference `--input-file-ref` and test stats files `--input-file-cur` need to be set by command line arguments. This is because the default stored in the `ICON.jinja` template is pointing to two files from the ensemble as a sanity check.
+Note that the reference `--input-file-ref` and test stats files
+`--input-file-cur` need to be set by command line arguments.
+This is because the default stored in the `ICON.jinja` template is pointing to
+two files from the ensemble as a sanity check.
 
-## Developing in probtest
+## Developing probtest
 #### Testing with [pytest](https://docs.pytest.org/en/8.2.x/)
 
 Our tests are executed using `pytest`, ensuring a consistent and efficient testing process. Each test dynamically generates its necessary test data, allowing for flexible and isolated testing scenarios.
@@ -169,18 +235,22 @@ pytest -s -v path/to/your/test.py
 
 Reference data, crucial for validating the outcomes of our tests and detecting any deviations in `probtests` results, is maintained in the [tests/data](tests/data) directory. This approach guarantees that our tests are both comprehensive and reliable, safeguarding the integrity of our codebase.
 
-### Code formatting
+### Formatting probtest source code
 
-Code is formatted using black and isort. Please install the pre-commit hooks (after installing all Python requirements including the `pre-commit` package):
+The probtest source code is formatted using multiple formatters.
+Please install the pre-commit hooks (after installing all Python requirements
+including the `pre-commit` package):
 
-```
+```console
 pre-commit install
 ```
 
-This hook will be executed automatically whenever you commit. It will check your files and format them according to its rules. If files have to be formatted, committing will fail. Just commit again to finalize the commit. You can also run the following command, to trigger the pre-commit action without actually committing:
-
-```
+This hook will be executed automatically whenever you commit.
+It will check your files and format them according to its rules.
+If files have to be formatted, committing will fail.
+Just stage and commit again to finalize the commit.
+You can also run the following command, to trigger the pre-commit action without
+actually committing:
+```console
 pre-commit run --all-files
 ```
-
-If you are using VSCode with the settings provided by this repository in `.vscode/settings.json` formatting is already enabled on save.
 
@@ -1,3 +1,10 @@
+"""
+CLI for CDO Table Generation
+
+This module computes and generates a CDO table by comparing model output data
+against perturbed model data.
+"""
+
 import tempfile
 from pathlib import Path
 
@@ -14,7 +21,7 @@
 from util.log_handler import logger
 
 
-def rel_diff(var1, var2):
+def compute_rel_diff(var1, var2):
     rel_diff = np.zeros_like(var1)
 
     mask_0 = np.logical_and(np.abs(var1) < 1e-15, np.abs(var2) < 1e-15)
@@ -31,8 +38,14 @@ def rel_diff(var1, var2):
 
 
 def rel_diff_stats(
-    file_id, filename, varname, time_dim, horizontal_dims, xarray_ds, fill_value_key
-):
+    file_id,
+    filename,
+    varname,
+    time_dim,
+    horizontal_dims,
+    xarray_ds,
+    fill_value_key,
+):  # pylint: disable=unused-argument
     dims = xarray_ds[varname].dims
     dataarray = xarray_ds[varname]
     time = xarray_ds[time_dim].values
@@ -52,7 +65,7 @@ def rel_diff_stats(
             mask = data != dataarray.attrs[fill_value_key]
             data = data[mask]
 
-        hist, edges = np.histogram(data, bins=[0] + cdo_bins)
+        hist, _ = np.histogram(data, bins=[0] + cdo_bins)
         matrix[i * ncol] = amax[i]
         matrix[i * ncol + 1 : i * ncol + ncol] = hist
 
@@ -116,7 +129,7 @@ def cdo_table(
     # TODO: A single perturbed run provides enough data to make proper statistics.
     #       refactor cdo_table interface to reflect that
     if len(member_num) == 1:
-        member_num = [i for i in range(1, member_num[0] + 1)]
+        member_num = list(range(1, member_num[0] + 1))
     if member_type:
         member_id = member_type + "_" + str(member_num[0])
     else:
@@ -132,13 +145,11 @@ def cdo_table(
 
     # step 1: compute rel-diff netcdf files
     with tempfile.TemporaryDirectory() as tmpdir:
-        for file_type, file_pattern in file_id:
+        for _, file_pattern in file_id:
             ref_files, err = file_names_from_pattern(model_output_dir, file_pattern)
             if err > 0:
                 logger.info(
-                    "did not find any files for pattern {}. Continue.".format(
-                        file_pattern
-                    )
+                    "did not find any files for pattern %s. Continue.", file_pattern
                 )
                 continue
             ref_files.sort()
@@ -147,21 +158,17 @@ def cdo_table(
             )
             if err > 0:
                 logger.info(
-                    "did not find any files for pattern {}. Continue.".format(
-                        file_pattern
-                    )
+                    "did not find any files for pattern %s. Continue.", file_pattern
                 )
                 continue
             perturb_files.sort()
 
             for rf, pf in zip(ref_files, perturb_files):
                 if not rf.endswith(".nc") or not pf.endswith(".nc"):
                     continue
-                ref_data = xr.open_dataset("{}/{}".format(model_output_dir, rf))
+                ref_data = xr.open_dataset(f"{model_output_dir}/{rf}")
                 perturb_data = xr.open_dataset(
-                    "{}/{}".format(
-                        perturbed_model_output_dir.format(member_id=member_id), pf
-                    )
+                    f"{perturbed_model_output_dir.format(member_id=member_id)}/{pf}"
                 )
                 diff_data = ref_data.copy()
                 varnames = [
@@ -171,12 +178,12 @@ def cdo_table(
                 ]
 
                 for v in varnames:
-                    diff_data.variables.get(v).values = rel_diff(
+                    diff_data.variables.get(v).values = compute_rel_diff(
                         ref_data.variables.get(v).values,
                         perturb_data.variables.get(v).values,
                     )
 
-                diff_data.to_netcdf("{}/{}".format(tmpdir, rf))
+                diff_data.to_netcdf(f"{tmpdir}/{rf}")
                 ref_data.close()
                 perturb_data.close()
                 diff_data.close()
@@ -191,7 +198,7 @@ def cdo_table(
                 df.loc[:, (t, cdo_bins)].sum(axis=1), axis=0
             )
 
-        logger.info("writing cdo table to {}.".format(cdo_table_file))
+        logger.info("writing cdo table to %s.", cdo_table_file)
 
         Path(cdo_table_file).parent.mkdir(parents=True, exist_ok=True)
         df.to_csv(cdo_table_file)