Initial design of mllam-verification package

I've now started the development of the inference vs persistence plots. I thought I would just provide my initial ideas of the design of the package, just to be aligned on these ideas before coding too much. Please provide any comments/thoughts/feedback!:)

1. First of all, I thought this could be the start on the development of a verification package for the mllam community, hence the name of the repo "mllam-verification".
2. I tried to follow and copy the repo structure of mllam-dataprep as a starting point.
3. I've defined an initial config file example with the following layout:
```yaml
schema_version: v0.1.0

inputs:
  datasets:
    initial:
      path: /path/to/initial.zarr
    target:
      path: /path/to/target.zarr
    prediction:
      path: /path/to/prediction.zarr
  variables:
    - 2t
    - 10u
  coord_ranges:
    time:
      start: 1990-09-03T00:00
      end: 1990-09-09T00:00
      step: PT3H

methods:
  - global_persistence
  - gridpoint_persistence

output:
  path: /path/to/output/directory
```
4. Since I like pydantic, I propose to use that for validation.

Some thoughts related to this structure:
- I was about to call the "target" dataset for "truth", but since I thought we would also like to use the package to e.g. verify the inference of one model against the inference of another, I went with "target". So, the "target" dataset is what we want to verify against, and the "prediction" dataset is what we want to verify.
- I added "coord_ranges" to make it possible to only verify a subset e.g. in time or space.
- I propose that we use the same setup as we agreed upon for the statistics calculation in mllam-dataprep (see https://github.com/mllam/mllam-data-prep/pull/42). That is, we define what verification methods we want to calculate in the "methods" section. We verify that those methods can be imported from within the package when parsing the config, if not script will fail.
- I thought that we would not only be interested in saving plots to disk, but also datasets with the verification metrics. For now, I've just added a "path" parameter to the "output" section, so I will just save the plots and the verification datasets to the same path. We can elaborate this if needed, e.g. what variables we want to save etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial design of mllam-verification package #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Initial design of mllam-verification package #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions