A pipeline for predicting sea ice.
You will need to install the following tools if you want to develop this project:
Create a file in config
that is called <your chosen name here>.local.yaml
.
You will want this to inherit from base.yaml
and then apply your own changes on top.
For example, the following config will override the base_path
option in base.yaml
:
defaults:
- base
base_path: /local/path/to/my/data
You can then run this with, e.g.:
uv run zebra datasets create --config-name <your local config>.yaml
You can also use this config to override other options in the base.yaml
file, as shown below:
defaults:
- base
- override /model: encode_unet_decode # Use this format if you want to use a different config
# Override specific model parameters
model:
processor:
start_out_channels: 37 # Use this format to override specific model parameters in the named configs
base_path: /local/path/to/my/data
Alternatively, you can apply overrides to specific options at the command line like this:
uv run zebra datasets create ++base_path=/local/path/to/my/data
Note that persistence.yaml
overrides the specific options in base.yaml
needed to run the Persistence
model.
As uv
cannot easily be installed on Baskerville, you should install the zebra
package directly into a virtual environment that you have set up.
source /path/to/venv/activate.sh
pip install -e .
This means that later commands like uv run X ...
should simply be X ...
instead.
You will need a CDS account to download data with anemoi
.
Run uv run zebra datasets create
to download all datasets locally.
Run uv run zebra datasets inspect
to inspect all datasets available locally.
Run uv run zebra train
to train using the datasets specified in the config.
ℹ️ This will save checkpoints to ${BASE_DIR}/training/wandb/run-${DATE}$-${RANDOM_STRING}/checkpoints/${CHECKPOINT_NAME}$.ckpt
.
Run uv run zebra evaluate --checkpoint PATH_TO_A_CHECKPOINT
to evaluate using a checkpoint from a training run.
An ice-station-zebra
model needs to be able to run over multiple different datasets with different dimensions.
These are structured in NTCHW
format, where:
N
is the batch size,T
is the number of history (forecast) steps for inputs (outputs)C
is the number of channels or variablesH
is a height dimensionW
is a width dimension
N
and T
will be the same for all inputs, but C
, H
and W
might vary.
Taking as an example, a batch size (N=2
), 3 history steps and 4 forecast steps, we will have k
inputs of shape (2, 3, C_k, H_k, W_k)
and one output of shape (2, 4, C_out, H_out, W_out)
.
A standalone model will need to accept a dict[str, TensorNTCHW]
which maps dataset names to an NTCHW
Tensor of values.
The model might want to use one or more of these for training, and will need to produce an output with shape N, T, C_out, H_out, W_out
.
As can be seen in the example below, a separate instance of the model is likely to be needed for each output to be predicted.
Pros:
- all input variables are available without transformation
Cons:
- hard to add new inputs
- hard to add new outputs
A processor model is part of a larger encode-process-decode step.
Start by defining a latent space as (C_latent, H_latent, W_latent)
- in the example below, this has been set to (10, 64, 64)
.
The encode-process-decode model automatically creates one encoder for each input and one decoder for each output.
The dataset-specific encoder takes the input data and converts it to shape (N, C_latent, H_latent, W_latent)
, compressing the time and channels dimensions.
The k
encoded datasets can then be combined in latent space to give a single dataset of shape (N, k * C_latent, H_latent, W_latent)
.
This is then passed to the processor, which must accept input of shape (N, k * C_latent, H_latent, W_latent)
and produce output of the same shape.
This output is then passed to one or more output-specific decoders which take input of shape (N, k * C_latent, H_latent, W_latent)
and produce output of shape (N, T, C_out, H_out, W_out)
, regenerating the time dimension.
Pros:
- easy to add new inputs
- easy to add new outputs
Cons:
- input variables have been transformed into latent space
- time-step information has been compressed into the latent space