Climate-REF
diff --git a/‎changelog/342.docs.md
Lines changed: 1 addition & 0 deletions b/‎changelog/342.docs.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/introduction/basic-concepts.md renamed to ‎docs/background/basic-concepts.md b/‎docs/introduction/basic-concepts.md renamed to ‎docs/background/basic-concepts.md
diff --git a/‎docs/development.md
Lines changed: 1 addition & 1 deletion b/‎docs/development.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/getting-started/01-configure.md
Lines changed: 151 additions & 0 deletions b/‎docs/getting-started/01-configure.md
Lines changed: 151 additions & 0 deletions
diff --git a/‎docs/getting-started/02-download-datasets.md
Lines changed: 68 additions & 0 deletions b/‎docs/getting-started/02-download-datasets.md
Lines changed: 68 additions & 0 deletions
diff --git a/‎docs/getting-started/03-ingest.md
Lines changed: 85 additions & 0 deletions b/‎docs/getting-started/03-ingest.md
Lines changed: 85 additions & 0 deletions
diff --git a/‎docs/getting-started/04-solve.md
Lines changed: 58 additions & 0 deletions b/‎docs/getting-started/04-solve.md
Lines changed: 58 additions & 0 deletions
@@ -0,0 +1 @@
+Add Getting Started section for ingesting and solving
@@ -158,7 +158,7 @@ MAMBA_PLATFORM=osx-64 uv run ref providers create-env --provider pmp
 To update a conda-lock file, run for example:
 
 ```bash
-uvx uvx conda-lock -p linux-64 -p osx-64 -p osx-arm64 -f packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/environment.yml
+uvx conda-lock -p linux-64 -p osx-64 -p osx-arm64 -f packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/environment.yml
 mv conda-lock.yml packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/requirements/conda-lock.yml
 ```
 
 
@@ -0,0 +1,151 @@
+# Configuration
+
+This tutorial assumes that you have already installed Climate-REF and are using a Linux or MacOS operating system.
+The `ref` CLI tool should be available in your terminal after installation
+(or via `uv run ref` if you are installing from source).
+For installation instructions, see [Installation](../installation.md).
+
+Climate-REF uses a TOML configuration file to specify data paths, output directories, and other settings. In this step, we'll generate and customize your configuration file.
+
+Additional information about the configuration file can be found in the [Configuration documentation](../configuration.md).
+
+
+## 1. Select a location for storing your configuration
+
+The most important part of the REF configuration is the location where the REF will store its data and results.
+This is determined using the `$REF_CONFIGURATION` environment variable.
+This can use up a large amount of disk space, so it is important to choose a location with sufficient storage.
+
+If no value is provided a default location will be used, but this will not be suitable for most users
+who use shared computing facilities.
+
+This environment variable can be set in your shell configuration file (e.g., `.bashrc`, `.zshrc`, etc.)
+or exported directly in your terminal session.
+
+```bash
+export REF_CONFIGURATION="/path/to/your/ref/configuration"
+```
+
+
+## 2. Generate
+
+Climate-REF provides a script to write out the default configuration.
+
+```bash
+mkdir $REF_CONFIGURATION
+ref config list > $REF_CONFIGURATION/ref.toml
+```
+
+This command will create the `$REF_CONFIGURATION` directory and create a `ref.toml` inside it with the default configuration settings.
+
+/// admonition | Note
+
+The location that the REF looks for the configuration file can be viewed by running a CLI command using the `-v` flag:
+
+```
+$ ref -v config list
+2025-05-28 10:45:29.244 +10:00 | DEBUG    | climate_ref.cli - Configuration loaded from: /path/to/your/climate-ref/.ref/ref.toml
+...
+```
+
+///
+
+## 3. Edit your configuration
+
+Open `$REF_CONFIGURATION/ref.toml` in your editor of choice.
+You will see a template configuration file with sections for logging, paths, database settings, and diagnostic providers.
+These should be customized to suit your environment and preferences.
+
+Additional information about the configuration file can be found in the [Configuration documentation](../configuration.md).
+
+An example configuration file might look like this with some placeholders:
+
+```toml
+log_level = "INFO"
+log_format = "<green>{time:YYYY-MM-DD HH:mm:ss.SSS Z}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan> - <level>{message}</level>"
+
+[paths]
+log = "$REF_CONFIGURATION/log"
+scratch = "$REF_CONFIGURATION/scratch"
+software = "$REF_CONFIGURATION/software"
+results = "$REF_CONFIGURATION/results"
+dimensions_cv = "$REF_INSTALL_DIR/climate-ref-core/src/climate_ref_core/pycmec/cv_cmip7_aft.yaml"
+
+[db]
+database_url = "sqlite:///$REF_CONFIGURATION/db/climate_ref.db"
+run_migrations = true
+max_backups = 5
+
+[executor]
+executor = "climate_ref.executor.LocalExecutor"
+
+[executor.config]
+
+[[diagnostic_providers]]
+provider = "climate_ref_esmvaltool.provider"
+
+[diagnostic_providers.config]
+
+[[diagnostic_providers]]
+provider = "climate_ref_ilamb.provider"
+
+[diagnostic_providers.config]
+
+[[diagnostic_providers]]
+provider = "climate_ref_pmp.provider"
+
+[diagnostic_providers.config]
+```
+
+
+The particularly important sections to customize are:
+
+- **paths**: Set the paths for logs, scratch space, software, and results. These should point to directories where you have write access.
+- **db**: Configure the database URL. By default, it uses SQLite, but you can change it to a PostgreSQL or other database if needed.
+- **executor**: Set the executor type. The default is `LocalExecutor`, but you can change it to `CeleryExecutor` or `HPCExecutor` for distributed execution (see the [Executor documentation](../how-to-guides/executors.md) for more details).
+- **diagnostic_providers**: List the diagnostic providers you want to use. The default includes ESMValTool, ILAMB, and PMP. You can add or remove providers as needed.
+
+## 4. Environment variables
+
+Optionally, you can export environment variables instead of hardcoding paths. See the [Environment Variables documentation](../configuration.md#additional-environment-variables) for more details.
+
+One important environment variable is `REF_DATASET_CACHE_DIR`,
+which specifies where the REF will cache downloaded datasets.
+This can be GBs of data, so it is recommended to set this to a scratch filesystem or a location with sufficient disk space.
+
+This can be set as follows:
+
+```bash
+export REF_DATASET_CACHE_DIR="/path/to/your/dataset/cache"
+```
+
+If set, Climate-REF will use these environment variables in preference to the configuration file.
+
+
+## 5. Validate your configuration
+
+To ensure your configuration is valid and correctly read by the REF, you can run the following command:
+
+```bash
+ref config list
+```
+
+Your configuration should be displayed without errors and should include any changes you made in the `ref.toml` file.
+
+
+## 6. Create Proivider-specific conda environments
+
+Some diagnostic providers require specific conda environments to be created before they can be used.
+This should happen before you run any diagnostics to avoid multiple installations of the same environment.
+By default, these conda environments will be installed the `$REF_CONFIGURATION/software` directory,
+but the location can be changed in the configuration file using the [paths.software](../configuration.md#paths_software).
+
+You can create these environments using the following command:
+
+```bash
+ref providers create-env
+```
+
+## Next steps
+
+After configuring, proceed to the [Download Datasets](02-download-datasets.md) tutorial to load your data into Climate-REF.
@@ -0,0 +1,68 @@
+# Download Required Datasets
+
+This tutorial covers how to fetch all reference datasets needed to run Climate-REF diagnostics. Ingesting these datasets is covered in the next tutorial.
+
+These commands should be rerun after new releases of Climate-REF to ensure you have the latest datasets.
+
+## Reference dataset requirements
+
+Climate-REF uses public, open-license reference data.
+Where possible, datasets from [obs4MIPs](https://pcmdi.github.io/obs4MIPs/) are recommended—they are CMOR-compliant, openly licensed, and archived on ESGF.
+
+During development, additional datasets have been identified for inclusion in obs4MIPs and will be added as they become available.
+This collection of datasets is referred to as `obs4REF` in the Climate-REF documentation.
+
+/// admonition | Note
+
+By default, fetched data is stored in a cache directory which is in your user directory by default.
+
+You can override this location by setting the `REF_DATASET_CACHE_DIR` environment variable:
+
+```bash
+export REF_DATASET_CACHE_DIR=/path/to/cache
+```
+
+///
+
+[](){#fetch-obs4ref-datasets}
+## 1. Fetching obs4REF datasets
+
+Use the `ref datasets fetch-data` command to retrieve each registry. Replace example paths with your desired output directories.
+
+These are hosted temporarily in one location until they become available on ESGF.
+This archive is ~30 GB in size, so ensure you have sufficient disk space available.
+In the future, these datasets will be available on ESGF and can be fetched directly from there:
+
+```bash
+ref datasets fetch-data --registry obs4ref --output-directory $REF_CONFIGURATION/datasets/obs4ref
+```
+
+[](){#fetch-pmp-climatology-datasets}
+## 2. PMP Climatology datasets
+
+PMP has generated a set of climatology datasets based on obs4MIPs data.
+These datasets are used for the PMP diagnostics and are not part of the obs4REF collection.
+
+```bash
+ref datasets fetch-data --registry pmp-climatology --output-directory $REF_CONFIGURATION/datasets/pmp-climatology
+```
+
+## 3. Provider-specific datasets
+
+Some diagnostics require additional datasets that are not ingested into the REF,
+but must be fetched separately.
+These datasets will eventually be integrated into the REF, but for now, they can be fetched using the following commands:
+
+
+```bash
+ref datasets fetch-data --registry ilamb
+ref datasets fetch-data --registry iomb
+ref datasets fetch-data --registry esmvaltool
+```
+
+[//]: # (TOODO: Add links to CLI reference once available)
+[//]: # (For more options and details, see the [Datasets CLI reference]&#40;../how-to-guides/ingest-datasets.md&#41;.)
+
+## Next steps
+
+After fetching your data, proceed to the [Ingest datasets](03-ingest.md) tutorial to load them into Climate-REF.
@@ -0,0 +1,85 @@
+# Ingest Datasets
+
+Ingestion extracts metadata from your locally downloaded datasets and stores it in a local catalog for easy querying and filtering.
+This makes subsequesnt operations, such as running diagnostics, more efficient as the system can quickly access the necessary metadata without needing to reprocess the files.
+
+Before you begin, ensure you have:
+
+- Fetched your reference data (see [Download Required Datasets](02-download-datasets.md)).
+- CMOR-compliant files accessible either locally or on a mounted filesystem.
+
+## 1. Ingest reference datasets
+
+The `obs4REF` collection we downloaded in the previous step uses the `obs4mips` source type as the data are obs4MIPs compatible. This command will extract metadata from the files and store it in the Climate-REF catalog, and print a summary of the ingested datasets.
+
+```bash
+ref datasets ingest --source-type obs4mips $REF_CONFIGURATION/datasets/obs4ref
+```
+
+Replace `$REF_CONFIGURATION/datasets/obs4ref` with the directory used when [fetched the obs4REF data](02-download-datasets.md#fetch-obs4ref-datasets).
+
+## 2. Ingest PMP Climatology data
+
+Use the `pmp-climatology` source type:
+
+```bash
+ref datasets ingest --source-type pmp-climatology $REF_CONFIGURATION/datasets/pmp-climatology
+```
+
+This registry contains pre-computed climatology fields used by the PMP diagnostics.
+Replace `$REF_CONFIGURATION/datasets/pmp-climatology` with the directory used when [fetched the pmp-climatology data](02-download-datasets.md#fetch-pmp-climatology-datasets)
+
+## 3. Ingest CMIP6 data
+
+To ingest CMIP6 files, point the CLI at a directory of netCDF files and set `cmip6` as the source type:
+
+```bash
+ref datasets ingest --source-type cmip6 /path/to/cmip6/data
+```
+
+
+Globbed-style paths can be used to specify multiple directories or file patterns.
+For example, if you have CMIP6 data organized by the CMIP6 DRS,
+you can use the following command to ingest all monthly and ancillary variables:
+
+```bash
+ref datasets ingest --source-type cmip6 /path/to/cmip6/data/CMIP6/*/*/*/*/*/*mon /path/to/cmip6/data/CMIP6/*/*/*/*/*/*fx --n-jobs 64
+```
+
+/// admonition | Tip
+
+As part of the Climate-REF test suite,
+we provide a sample set of CMIP6 (and obs4REF) data that can be used for testing and development purposes.
+These datasets have been decimated to reduce their size.
+These datasets should not be used for production runs, but they are useful for testing the ingestion and diagnostic processes.
+
+To fetch and ingest the sample CMIP6 data, run the following commands:
+
+```bash
+ref datasets fetch-data --registry sample-data --output-directory $REF_CONFIGURATION/datasets/sample-data
+ref datasets ingest --source-type cmip6 $REF_CONFIGURATION/datasets/sample-data/CMIP6
+```
+
+///
+
+## 4. Query your catalog
+
+After ingestion, list the datasets to verify:
+
+```bash
+ref datasets list
+```
+
+You can also filter by column:
+
+```bash
+ref datasets list --column instance_id --column variable_id
+```
+
+
+[//]: # (TODO: Add links to CLI reference once available)
+[//]: # (For a complete list of flags, see the [Datasets CLI reference]&#40;../how-to-guides/ingest-datasets.md&#41;.)
+
+## Next steps
+
+With your data cataloged, you’re ready to run diagnostics. Proceed to the [Solve tutorial](04-solve.md).
@@ -0,0 +1,58 @@
+# Solve Diagnostics
+
+With your datasets ingested and cataloged, you can now solve and execute diagnostics using the `ref solve` command.
+
+## 1. Run all diagnostics (default)
+
+By default, `ref solve` will discover and schedule _all_ available diagnostics across all providers. The default executor is the **local executor**, which runs diagnostics in parallel using a process pool:
+
+```bash
+ref solve --timeout 3600
+```
+
+This will:
+
+- Query the catalog of ingested datasets (observations and model-output)
+- Determine which diagnostics are applicable and how many different executions are needed
+- Execute each diagnostic in parallel on your machine
+- Use a timeout of 3600 seconds (1 hour) to complete the runs
+
+Note: it is normal for some executions to fail (e.g., due to missing data or configuration).
+You can re-run or inspect failures as needed.
+
+/// admonition | Tip
+
+To target a specific provider or diagnostic, use the `--provider` and `--diagnostic` flags:
+
+```bash
+# Run only PMP diagnostics
+ref solve --provider pmp
+
+# Run only diagnostics containing "enso" in their slug
+ref solve --diagnostic enso
+```
+
+Replace `pmp` or `enso` with any provider or diagnostic slug listed in your installation.
+///
+
+## 2. Monitor execution status
+
+You can view the status of execution groups with:
+
+```bash
+ref executions list-group
+```
+
+Each group corresponds to a set of related executions (e.g., all runs of a diagnostic for one model).
+To see details for a specific group, use:
+
+```bash
+ref executions inspect <group_id>
+```
+
+This will show the status (pending, running, succeeded, failed) of each execution in the group and any error messages.
+This log output is very useful to include if you need to [report an issue or seek help](https://github.com/Climate-REF/climate-ref/issues).
+
+## Next steps
+
+Once diagnostics have completed, visualize the results in the [Visualise tutorial](05-visualise.md).
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Add Getting Started section for ingesting and solving`