Skip to content

Commit 3271e1b

Browse files
authored
Merge pull request #342 from Climate-REF/update-docs
2 parents 5db9479 + 3a674dd commit 3271e1b

File tree

18 files changed

+913
-93
lines changed

18 files changed

+913
-93
lines changed

changelog/342.docs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add Getting Started section for ingesting and solving

docs/development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ MAMBA_PLATFORM=osx-64 uv run ref providers create-env --provider pmp
158158
To update a conda-lock file, run for example:
159159

160160
```bash
161-
uvx uvx conda-lock -p linux-64 -p osx-64 -p osx-arm64 -f packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/environment.yml
161+
uvx conda-lock -p linux-64 -p osx-64 -p osx-arm64 -f packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/environment.yml
162162
mv conda-lock.yml packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/conda-lock.yml
163163
```
164164

docs/getting-started/01-configure.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Configuration
2+
3+
This tutorial assumes that you have already installed Climate-REF and are using a Linux or MacOS operating system.
4+
The `ref` CLI tool should be available in your terminal after installation
5+
(or via `uv run ref` if you are installing from source).
6+
For installation instructions, see [Installation](../installation.md).
7+
8+
Climate-REF uses a TOML configuration file to specify data paths, output directories, and other settings. In this step, we'll generate and customize your configuration file.
9+
10+
Additional information about the configuration file can be found in the [Configuration documentation](../configuration.md).
11+
12+
13+
## 1. Select a location for storing your configuration
14+
15+
The most important part of the REF configuration is the location where the REF will store its data and results.
16+
This is determined using the `$REF_CONFIGURATION` environment variable.
17+
This can use up a large amount of disk space, so it is important to choose a location with sufficient storage.
18+
19+
If no value is provided a default location will be used, but this will not be suitable for most users
20+
who use shared computing facilities.
21+
22+
This environment variable can be set in your shell configuration file (e.g., `.bashrc`, `.zshrc`, etc.)
23+
or exported directly in your terminal session.
24+
25+
```bash
26+
export REF_CONFIGURATION="/path/to/your/ref/configuration"
27+
```
28+
29+
30+
## 2. Generate
31+
32+
Climate-REF provides a script to write out the default configuration.
33+
34+
```bash
35+
mkdir $REF_CONFIGURATION
36+
ref config list > $REF_CONFIGURATION/ref.toml
37+
```
38+
39+
This command will create the `$REF_CONFIGURATION` directory and create a `ref.toml` inside it with the default configuration settings.
40+
41+
/// admonition | Note
42+
43+
The location that the REF looks for the configuration file can be viewed by running a CLI command using the `-v` flag:
44+
45+
```
46+
$ ref -v config list
47+
2025-05-28 10:45:29.244 +10:00 | DEBUG | climate_ref.cli - Configuration loaded from: /path/to/your/climate-ref/.ref/ref.toml
48+
...
49+
```
50+
51+
///
52+
53+
## 3. Edit your configuration
54+
55+
Open `$REF_CONFIGURATION/ref.toml` in your editor of choice.
56+
You will see a template configuration file with sections for logging, paths, database settings, and diagnostic providers.
57+
These should be customized to suit your environment and preferences.
58+
59+
Additional information about the configuration file can be found in the [Configuration documentation](../configuration.md).
60+
61+
An example configuration file might look like this with some placeholders:
62+
63+
```toml
64+
log_level = "INFO"
65+
log_format = "<green>{time:YYYY-MM-DD HH:mm:ss.SSS Z}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan> - <level>{message}</level>"
66+
67+
[paths]
68+
log = "$REF_CONFIGURATION/log"
69+
scratch = "$REF_CONFIGURATION/scratch"
70+
software = "$REF_CONFIGURATION/software"
71+
results = "$REF_CONFIGURATION/results"
72+
dimensions_cv = "$REF_INSTALL_DIR/climate-ref-core/src/climate_ref_core/pycmec/cv_cmip7_aft.yaml"
73+
74+
[db]
75+
database_url = "sqlite:///$REF_CONFIGURATION/db/climate_ref.db"
76+
run_migrations = true
77+
max_backups = 5
78+
79+
[executor]
80+
executor = "climate_ref.executor.LocalExecutor"
81+
82+
[executor.config]
83+
84+
[[diagnostic_providers]]
85+
provider = "climate_ref_esmvaltool.provider"
86+
87+
[diagnostic_providers.config]
88+
89+
[[diagnostic_providers]]
90+
provider = "climate_ref_ilamb.provider"
91+
92+
[diagnostic_providers.config]
93+
94+
[[diagnostic_providers]]
95+
provider = "climate_ref_pmp.provider"
96+
97+
[diagnostic_providers.config]
98+
```
99+
100+
101+
The particularly important sections to customize are:
102+
103+
- **paths**: Set the paths for logs, scratch space, software, and results. These should point to directories where you have write access.
104+
- **db**: Configure the database URL. By default, it uses SQLite, but you can change it to a PostgreSQL or other database if needed.
105+
- **executor**: Set the executor type. The default is `LocalExecutor`, but you can change it to `CeleryExecutor` or `HPCExecutor` for distributed execution (see the [Executor documentation](../how-to-guides/executors.md) for more details).
106+
- **diagnostic_providers**: List the diagnostic providers you want to use. The default includes ESMValTool, ILAMB, and PMP. You can add or remove providers as needed.
107+
108+
## 4. Environment variables
109+
110+
Optionally, you can export environment variables instead of hardcoding paths. See the [Environment Variables documentation](../configuration.md#additional-environment-variables) for more details.
111+
112+
One important environment variable is `REF_DATASET_CACHE_DIR`,
113+
which specifies where the REF will cache downloaded datasets.
114+
This can be GBs of data, so it is recommended to set this to a scratch filesystem or a location with sufficient disk space.
115+
116+
This can be set as follows:
117+
118+
```bash
119+
export REF_DATASET_CACHE_DIR="/path/to/your/dataset/cache"
120+
```
121+
122+
If set, Climate-REF will use these environment variables in preference to the configuration file.
123+
124+
125+
## 5. Validate your configuration
126+
127+
To ensure your configuration is valid and correctly read by the REF, you can run the following command:
128+
129+
```bash
130+
ref config list
131+
```
132+
133+
Your configuration should be displayed without errors and should include any changes you made in the `ref.toml` file.
134+
135+
136+
## 6. Create Proivider-specific conda environments
137+
138+
Some diagnostic providers require specific conda environments to be created before they can be used.
139+
This should happen before you run any diagnostics to avoid multiple installations of the same environment.
140+
By default, these conda environments will be installed the `$REF_CONFIGURATION/software` directory,
141+
but the location can be changed in the configuration file using the [paths.software](../configuration.md#paths_software).
142+
143+
You can create these environments using the following command:
144+
145+
```bash
146+
ref providers create-env
147+
```
148+
149+
## Next steps
150+
151+
After configuring, proceed to the [Download Datasets](02-download-datasets.md) tutorial to load your data into Climate-REF.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Download Required Datasets
2+
3+
This tutorial covers how to fetch all reference datasets needed to run Climate-REF diagnostics. Ingesting these datasets is covered in the next tutorial.
4+
5+
These commands should be rerun after new releases of Climate-REF to ensure you have the latest datasets.
6+
7+
## Reference dataset requirements
8+
9+
Climate-REF uses public, open-license reference data.
10+
Where possible, datasets from [obs4MIPs](https://pcmdi.github.io/obs4MIPs/) are recommended—they are CMOR-compliant, openly licensed, and archived on ESGF.
11+
12+
During development, additional datasets have been identified for inclusion in obs4MIPs and will be added as they become available.
13+
This collection of datasets is referred to as `obs4REF` in the Climate-REF documentation.
14+
15+
/// admonition | Note
16+
17+
By default, fetched data is stored in a cache directory which is in your user directory by default.
18+
19+
You can override this location by setting the `REF_DATASET_CACHE_DIR` environment variable:
20+
21+
```bash
22+
export REF_DATASET_CACHE_DIR=/path/to/cache
23+
```
24+
25+
///
26+
27+
[](){#fetch-obs4ref-datasets}
28+
## 1. Fetching obs4REF datasets
29+
30+
Use the `ref datasets fetch-data` command to retrieve each registry. Replace example paths with your desired output directories.
31+
32+
These are hosted temporarily in one location until they become available on ESGF.
33+
This archive is ~30 GB in size, so ensure you have sufficient disk space available.
34+
In the future, these datasets will be available on ESGF and can be fetched directly from there:
35+
36+
```bash
37+
ref datasets fetch-data --registry obs4ref --output-directory $REF_CONFIGURATION/datasets/obs4ref
38+
```
39+
40+
[](){#fetch-pmp-climatology-datasets}
41+
## 2. PMP Climatology datasets
42+
43+
PMP has generated a set of climatology datasets based on obs4MIPs data.
44+
These datasets are used for the PMP diagnostics and are not part of the obs4REF collection.
45+
46+
```bash
47+
ref datasets fetch-data --registry pmp-climatology --output-directory $REF_CONFIGURATION/datasets/pmp-climatology
48+
```
49+
50+
## 3. Provider-specific datasets
51+
52+
Some diagnostics require additional datasets that are not ingested into the REF,
53+
but must be fetched separately.
54+
These datasets will eventually be integrated into the REF, but for now, they can be fetched using the following commands:
55+
56+
57+
```bash
58+
ref datasets fetch-data --registry ilamb
59+
ref datasets fetch-data --registry iomb
60+
ref datasets fetch-data --registry esmvaltool
61+
```
62+
63+
[//]: # (TOODO: Add links to CLI reference once available)
64+
[//]: # (For more options and details, see the [Datasets CLI reference]&#40;../how-to-guides/ingest-datasets.md&#41;.)
65+
66+
## Next steps
67+
68+
After fetching your data, proceed to the [Ingest datasets](03-ingest.md) tutorial to load them into Climate-REF.

docs/getting-started/03-ingest.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Ingest Datasets
2+
3+
Ingestion extracts metadata from your locally downloaded datasets and stores it in a local catalog for easy querying and filtering.
4+
This makes subsequesnt operations, such as running diagnostics, more efficient as the system can quickly access the necessary metadata without needing to reprocess the files.
5+
6+
Before you begin, ensure you have:
7+
8+
- Fetched your reference data (see [Download Required Datasets](02-download-datasets.md)).
9+
- CMOR-compliant files accessible either locally or on a mounted filesystem.
10+
11+
## 1. Ingest reference datasets
12+
13+
The `obs4REF` collection we downloaded in the previous step uses the `obs4mips` source type as the data are obs4MIPs compatible. This command will extract metadata from the files and store it in the Climate-REF catalog, and print a summary of the ingested datasets.
14+
15+
```bash
16+
ref datasets ingest --source-type obs4mips $REF_CONFIGURATION/datasets/obs4ref
17+
```
18+
19+
Replace `$REF_CONFIGURATION/datasets/obs4ref` with the directory used when [fetched the obs4REF data](02-download-datasets.md#fetch-obs4ref-datasets).
20+
21+
## 2. Ingest PMP Climatology data
22+
23+
Use the `pmp-climatology` source type:
24+
25+
```bash
26+
ref datasets ingest --source-type pmp-climatology $REF_CONFIGURATION/datasets/pmp-climatology
27+
```
28+
29+
This registry contains pre-computed climatology fields used by the PMP diagnostics.
30+
Replace `$REF_CONFIGURATION/datasets/pmp-climatology` with the directory used when [fetched the pmp-climatology data](02-download-datasets.md#fetch-pmp-climatology-datasets)
31+
32+
## 3. Ingest CMIP6 data
33+
34+
To ingest CMIP6 files, point the CLI at a directory of netCDF files and set `cmip6` as the source type:
35+
36+
```bash
37+
ref datasets ingest --source-type cmip6 /path/to/cmip6/data
38+
```
39+
40+
41+
Globbed-style paths can be used to specify multiple directories or file patterns.
42+
For example, if you have CMIP6 data organized by the CMIP6 DRS,
43+
you can use the following command to ingest all monthly and ancillary variables:
44+
45+
```bash
46+
ref datasets ingest --source-type cmip6 /path/to/cmip6/data/CMIP6/*/*/*/*/*/*mon /path/to/cmip6/data/CMIP6/*/*/*/*/*/*fx --n-jobs 64
47+
```
48+
49+
/// admonition | Tip
50+
51+
As part of the Climate-REF test suite,
52+
we provide a sample set of CMIP6 (and obs4REF) data that can be used for testing and development purposes.
53+
These datasets have been decimated to reduce their size.
54+
These datasets should not be used for production runs, but they are useful for testing the ingestion and diagnostic processes.
55+
56+
To fetch and ingest the sample CMIP6 data, run the following commands:
57+
58+
```bash
59+
ref datasets fetch-data --registry sample-data --output-directory $REF_CONFIGURATION/datasets/sample-data
60+
ref datasets ingest --source-type cmip6 $REF_CONFIGURATION/datasets/sample-data/CMIP6
61+
```
62+
63+
///
64+
65+
## 4. Query your catalog
66+
67+
After ingestion, list the datasets to verify:
68+
69+
```bash
70+
ref datasets list
71+
```
72+
73+
You can also filter by column:
74+
75+
```bash
76+
ref datasets list --column instance_id --column variable_id
77+
```
78+
79+
80+
[//]: # (TODO: Add links to CLI reference once available)
81+
[//]: # (For a complete list of flags, see the [Datasets CLI reference]&#40;../how-to-guides/ingest-datasets.md&#41;.)
82+
83+
## Next steps
84+
85+
With your data cataloged, you’re ready to run diagnostics. Proceed to the [Solve tutorial](04-solve.md).

docs/getting-started/04-solve.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Solve Diagnostics
2+
3+
With your datasets ingested and cataloged, you can now solve and execute diagnostics using the `ref solve` command.
4+
5+
## 1. Run all diagnostics (default)
6+
7+
By default, `ref solve` will discover and schedule _all_ available diagnostics across all providers. The default executor is the **local executor**, which runs diagnostics in parallel using a process pool:
8+
9+
```bash
10+
ref solve --timeout 3600
11+
```
12+
13+
This will:
14+
15+
- Query the catalog of ingested datasets (observations and model-output)
16+
- Determine which diagnostics are applicable and how many different executions are needed
17+
- Execute each diagnostic in parallel on your machine
18+
- Use a timeout of 3600 seconds (1 hour) to complete the runs
19+
20+
Note: it is normal for some executions to fail (e.g., due to missing data or configuration).
21+
You can re-run or inspect failures as needed.
22+
23+
/// admonition | Tip
24+
25+
To target a specific provider or diagnostic, use the `--provider` and `--diagnostic` flags:
26+
27+
```bash
28+
# Run only PMP diagnostics
29+
ref solve --provider pmp
30+
31+
# Run only diagnostics containing "enso" in their slug
32+
ref solve --diagnostic enso
33+
```
34+
35+
Replace `pmp` or `enso` with any provider or diagnostic slug listed in your installation.
36+
///
37+
38+
## 2. Monitor execution status
39+
40+
You can view the status of execution groups with:
41+
42+
```bash
43+
ref executions list-group
44+
```
45+
46+
Each group corresponds to a set of related executions (e.g., all runs of a diagnostic for one model).
47+
To see details for a specific group, use:
48+
49+
```bash
50+
ref executions inspect <group_id>
51+
```
52+
53+
This will show the status (pending, running, succeeded, failed) of each execution in the group and any error messages.
54+
This log output is very useful to include if you need to [report an issue or seek help](https://github.com/Climate-REF/climate-ref/issues).
55+
56+
## Next steps
57+
58+
Once diagnostics have completed, visualize the results in the [Visualise tutorial](05-visualise.md).

0 commit comments

Comments
 (0)