Skip to content

[WIP] Updated land calibration pipeline #1210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

[WIP] Updated land calibration pipeline #1210

wants to merge 4 commits into from

Conversation

ph-kev
Copy link
Member

@ph-kev ph-kev commented Jul 7, 2025

This PR rewrites the land calibration pipeline to use the latest snowy land model, the new ObservationRecipe in ClimaCalibrate, and ClimaAnalysis for data preprocessing and transformations. With these additions, the pipeline should be

  • simpler to understand
  • offload the covariance matrix computation to ClimaCalibrate
  • less error prone when adding priors, modifying or adding new observations, modifying the calibration configuration, etc

However, the updated land calibration pipeline is brittle in multiple aspects and these issues affect the other calibration pipelines. In this issue, the issues specific to the land calibration pipeline is listed.

  1. Overwriting parameters is painful. There is no easy way of overwriting parameters. See the example below of the current land calibration pipeline.

p_names = collect(keys(params))
p_values = [params[name]["value"] for name in p_names]
params = (; zip(Symbol.(p_names), p_values)...)
(;
# pc,
# sc,
# K_sat_plant,
# a,
# h_leaf,
# α_snow,
# α_soil_dry_scaler,
# τ_leaf_scaler,
# α_leaf_scaler,
# α_soil_scaler,
α_0,
Δα,
k,
beta_snow,
x0_snow,
gamma_snow,
beta_0,
# beta_min,
z0_snow,
) = params

  1. Configuring the calibration is not straightforward. To configure the calibration, all the settings are centralized to a single file, but making the user modify what those functions return does not seem ideal. Furthermore, it is not clear whether using functions is the best way to pass those information to the worker processes.

  2. Adding a variable is difficult and not obvious. To add a new variable, you need to modify three different files, specify what the simulation and observational data are, how they should be preprocessed (e.g. units conversion, shifting dates), and add additional data transformation (e.g. seasonal averages). This process is error prone and the transformations can easily get out of sync.

  3. Getting the landsea mask. To get a landsea mask, you need to go through ClimaCore and ClimaAnalysis. This is not intuitive and can easily lead to errors if the diagnostics change.

  4. Mask-aware replace and flatten. In the calibration code, there is a hack to replace all the NaN values on land with the average non-nan value on land. Furthermore, flatten removes all NaN regardless of where it is. This means that any NaN on land and completely stop a calibration.

Update to packages

  • New release of ClimaAnalysis (needed for better masking functions)
  • New release of ClimaCalibrate (needed for updates to ObservationRecipes)
  • New release of EnsembleKalmanProcesses (needed for metadata for observations)

@ph-kev ph-kev marked this pull request as draft July 7, 2025 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant