This repository contains code and a data pipeline for experimenting with deep learning models for short-term solar radiation nowcasting, using geostationary satellite images. It includes implementations and training workflows for both DGMR-SO and Earthformer architectures.
Inside the /RawData
folder of this repository, you will find the script used to prepare the raw satellite data. Furthermore this folder contains the dataset class used to prepare the rolling windows.
Creates a geographic subset of the original data, focusing on a specific region that includes the Netherlands, Belgium, France, England, and parts of neighboring countries.
Usage:
python msgcpp_reduce_subset.py
Once the subset is generated, split it into a training and testing set. For convenience, place them in folders called: raw_train_data
and raw_test_data
.
Defines a custom PyTorch Dataset class (NetCDFNowcastingDataset) for loading and preprocessing satellite data stored in NetCDF files. This dataset creates sliding temporal windows of input and target frames for nowcasting tasks.
Main features:
- Loads time-sequenced NetCDF files sorted by timestamp.
- Automatically filters out invalid samples (files with too many dark pixels or broken time intervals).
- Returns data in the shape required for model input: a concatenated tensor of input and target sequences.
DGMR-SO is an adapted version of the Deep Generative Model for Radar (DGMR) by Google DeepMind, tailored for satellite-based Surface Solar Irradiance (SSI) nowcasting. Instead of radar data, this implementation uses geostationary satellite imagery from the SEVIRI instrument aboard the Meteosat-11 satellite, normalized to represent clear-sky index for improved stability. The task is to predict the next 4 hours (16 future timesteps) of SSI based on the past 1 hour (4 timesteps) using rolling windows.
The original paper: https://www.sciencedirect.com/science/article/pii/S0038092X24005619#b0240
!IMPORTANT make sure conda, cuda and cuDNN are avaialble/downloaded!
For Virtual Machine on EUMETSAT, follow these steps: https://confluence.ecmwf.int/display/EWCLOUDKB/EUMETSAT+-+GPU+support
- Clone the GitHub repository
- Move into DGMR_SO folder
- Create conda environment:
conda create -n dgmr_env python=3.9 -y
- Activate conda environment:
conda activate dgmr_env
- Install the required packages:
pip install tensorflow[and-cuda]
pip install matplotlib
pip install dm-sonnet
pip install pyyaml
pip install tensorboard
After these steps you should be able the execute the next steps.
Converts the subset .nc files into TFRecord format, using the NetCDFNowcastingDataset class.
- Requires the helper file tfrecord_shards_for_nowcasting.py. This file is custom made to create TFRecords which are suitable for nowcasting.
- Creates a 80/20 split for the train and validation dataset.
Usage:
python data_preperation_tfr.py
Update the script with the correct input/output paths and match the height and width to your selected geographic region.
After the data processing and quality control, the data_pipleine.py
file is eventually used by the models to make use of the TFRecord datasets.
It performs:
- Random cropping.
- Flipping (horizontal & vertical).
- Contrast adjustements.
Before training an instance of DGMR-SO, make sure the train, validation and test sets are in designated folders in the /Data directory. Furthermore, make sure you are in the DGMR-SO directory and have created an virtual environment desribed in the section above.
Afterwards typ the following command in the console:
python train.py
This command will execute the full training process for 500.000 steps (or otherwise defined in the train.yml). There is a high change the full 500.00 steps are not necessary. We recommend the use of tensorboard to watch and evaluate the training process closely.
When working on a EUMETSAT Virtual Machine and you wat to run tensorboard locally, make sure you connect to the VM as follows:
ssh -L 6006:localhost:6006 'user'@'IP_of_VM'
When connected succesfully run the following command to load and view tensorboard on localhost:6006
:
tensorboard --logdir='directory_to_train_logs' --port=6006 --host=localhost
EarthFormer is a space-time transformer architecture designed for Earth system forecasting tasks, such as weather nowcasting and precipitation prediction. Unlike traditional CNN or ConvLSTM-based models, EarthFormer leverages spatiotemporal attention mechanisms to effectively capture both short- and long-range dependencies across space and time. The transformer Utilizes axial attention and factorized self-attention to reduce complexity while preserving global context. In this project, EarthFormer is adapted to nowcast Surface Solar Irradiance (SSI) using satellite-derived input data.
The original paper: https://www.amazon.science/publications/earthformer-exploring-space-time-transformers-for-earth-system-forecasting
!IMPORTANT make sure conda, cuda and cuDNN are avaialble/downloaded!
For Virtual Machine on EUMETSAT, follow these steps: https://confluence.ecmwf.int/display/EWCLOUDKB/EUMETSAT+-+GPU+support
- Clone the GitHub repository
- Move into EarthFormer folder
- Create conda environment:
conda create -n ef_env python=3.9 -y
- Activate conda environment:
conda activate ef_env
- Install the required packages:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install pytorch_lightning==2.5.1
pip install torchmetrics==2.5.1
pip install xarray netcdf4 opencv-python earthnet==0.3.9
pip install -U -e . --no-build-isolation
After these steps you should be able the execute the next steps.
Converts the subset .nc files into h5 format, using the NetCDFNowcastingDataset class.
- Creates a 80/20 split for the train and validation dataset.
Usage:
python data_preperation_to_h5.py
Update the script with the correct input/output paths and match the height and width to your selected geographic region.
Before training an instance of EarthFormer, make sure the train, validation and test sets are in designated folders in the /Data directory. Furthermore, make sure you are in the EarthFormer directory and have created an virtual environment desribed in the section above.
Afterwards typ the following command in the console:
python train.py --gpu 1 --cfg config/train.yml --ckpt_name last.ckpt --save "version name"
This command will execute the full training process on the dataset. We recommend the use of tensorboard to watch and evaluate the training process closely.
When working on a EUMETSAT Virtual Machine and you wat to run tensorboard locally, make sure you connect to the VM as follows:
ssh -L 6006:localhost:6006 'user'@'IP_of_VM'
When connected succesfully run the following command to load and view tensorboard on localhost:6006
:
tensorboard --logdir='directory_to_train_logs' --port=6006 --host=localhost
To compare DGMR-SO, EarthFormer and a persistence model, we created a script which evaluates all models on the metrics defined in the report. Furthermore, it generates image predictions and metric plots and saves them as PNGs.
!IMPORTANT make sure conda, cuda and cuDNN are avaialble/downloaded!
For Virtual Machine on EUMETSAT, follow these steps: https://confluence.ecmwf.int/display/EWCLOUDKB/EUMETSAT+-+GPU+support
- Move into Comparison folder
- Create conda environment:
conda create -n comp_env python=3.9 -y
- Activate conda environment:
conda activate comp_env
- Install the required packages:
pip install tensorflow[and-cuda]
pip install matplotlib
pip install dm-sonnet
pip install pyyaml
pip install tensorboard
pip install scikit-learn
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install pytorch_lightning==2.5.1
pip install xarray netcdf4 opencv-python earthnet==0.3.9
pip install -U -e . --no-build-isolation
pip install psutil
After these steps you should be able the execute the next steps.
Make sure the paths in the compare_models.py
script are correclty defined. Afterwards run:
python compare_models.py
The images and metric scores will be saved in a designated folder.