Satellite-Based Flood Detection Pipeline

Using satellite imagery for detecting and mapping floodwater

Project Description

Floods rank among the most destructive natural disasters, impacting millions globally each year. Rapid and precise flood mapping is vital for effective disaster response and mitigation efforts. This project delivers a scalable, cutting-edge solution for flood detection by harnessing satellite imagery, empowering researchers, data scientists, and emergency responders with advanced tools.

The pipeline utilizes the all-weather reliability of Synthetic Aperture Radar (SAR) data from Sentinel-1, with optional integration of high-resolution optical data from Sentinel-2. It offers two distinct approaches:

Rapid Response: A lightweight, classical machine learning model (Random Forest) for swift deployment in emergencies.
High-Accuracy: A deep learning model (U-Net) for research-grade precision and operational excellence.

Features

End-to-End Workflow: Seamlessly handles data acquisition, preprocessing, training, evaluation, and inference.
Dual-Sensor Support: Primarily uses Sentinel-1 (SAR), with optional Sentinel-2 (Optical) fusion.
Multi-Modal Models: Combines classical ML (Random Forest) and deep learning (U-Net) for flexibility.
Standardized Preprocessing: Integrates the ESA SNAP Engine for consistent Analysis-Ready Data (ARD).
Benchmark Datasets: Compatible with Sen1Floods11 and WorldFloods for training and validation.
Robust Metrics: Evaluates performance with IoU and F1-Score, ideal for imbalanced flood data.

Project Structure

Organized as a Python package for streamlined setup and usage:

flood-detection-pipeline/
├── data/                    # Data storage (not tracked by Git)
│   ├── raw/                 # Raw satellite imagery
│   ├── processed/           # Preprocessed, analysis-ready data
│   └── training_data/       # Labeled datasets (e.g., Sen1Floods11)
│
├── notebooks/               # Exploratory Jupyter notebooks
│
├── results/                 # Outputs: models and flood maps
│   ├── models/塑造             # Trained model weights
│   └── maps/                # Generated flood maps (GeoTIFFs)
│
├── src/
│   └── flood_detector/      # Core Python package
│       ├── __init__.py
│       ├── config.py        # Configuration settings
│       ├── data_handler.py  # Data loading and prep
│       ├── evaluate.py      # Evaluation metrics
│       ├── models/          # Model definitions
│       │   ├── random_forest.py
│       │   └── unet.py
│       ├── preprocess.py    # Preprocessing scripts
│       ├── train.py         # Training logic
│       └── predict.py       # Inference scripts
│
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml           # Poetry dependency file

Methodology

The pipeline offers two complementary pathways tailored to different needs:

Feature	Path A: Rapid Response	Path B: High-Accuracy
Algorithm	Random Forest	U-Net
Data	Sentinel-1 (pre/post)	Sentinel-1 + Sentinel-2
Compute Needs	Low	High (GPU recommended)
Use Case	Emergency response	Research/operations

Path A: Rapid Response / Proof-of-Concept

Algorithm: Random Forest Classifier
Data: Sentinel-1 GRD (pre- and post-flood)
Features: VV/VH backscatter, change detection, DEM derivatives (slope, elevation)
Purpose: Fast deployment with minimal resources for urgent scenarios

Path B: High-Accuracy / Research-Grade

Algorithm: U-Net (or variants like Attention U-Net)
Data: Fused Sentinel-1 (SAR) and Sentinel-2 (Optical)
Features: Multi-channel tensor with SAR, optical bands, and HAND index
Purpose: Superior accuracy for research and operational systems

Data Sources

Two-Phase Data Strategy

Training Phase ("Textbook")
- Datasets: Sen1Floods11, WorldFloods
- Goal: Teach models flood patterns using labeled data
Prediction Phase ("Real-World")
- Datasets: User-downloaded event imagery
- Goal: Apply trained models to new flood events

Data Requirements

⚠️ Note: Input needs vary by model:

Random Forest (Path A): 2 images (pre- and post-flood)
U-Net (Path B): 1 image (post-flood only)

Sources

Primary Imagery:
- Sentinel-1 (SAR)
- Sentinel-2 (Optical)
Ancillary Data:
- Copernicus DEM
Training Datasets:
- Sen1Floods11 (~14GB)
- WorldFloods (~76GB)

Installation

Clone the Repository:

git clone https://github.com/your-username/flood-detection-pipeline.git
cd flood-detection-pipeline

Install ESA SNAP: Download and install ESA SNAP and configure snappy per ESA’s instructions.
Install Poetry:
```
pip install poetry
```
Install Dependencies:
```
poetry install
```
Activate Environment:
```
poetry shell
```

Dataset Acquisition Guide

Part 1: Benchmark Datasets

Sen1Floods11 (SAR)

Size: ~14GB

Steps:

mkdir data/training_data/Sen1Floods11
gsutil -m rsync -r gs://sen1floods11 data/training_data/Sen1Floods11/

Requires Google Cloud SDK.

WorldFloods (Optical)

Size: ~76GB

Steps:

pip install huggingface_hub
mkdir data/training_data/WorldFloods
huggingface-cli download isp-uv-es/WorldFloodsv2 --repo-type dataset --local-dir data/training_data/WorldFloods

Part 2: Event Imagery (e.g., Limburg Floods, July 2021)

Location: Valkenburg aan de Geul, Netherlands (50.8661°N, 5.8308°E)
Source: Copernicus Dataspace
Steps:
1. Register and log in.
2. Search area and filter:
  - Sentinel-1: GRD, June 1–July 18, 2021
  - Sentinel-2: Level-2A, 0–10% cloud, same period
3. Download and move to data/raw/.

Usage

Run scripts from the src/flood_detector/ package.

1. Preprocessing

poetry run python -m src.flood_detector.preprocess \
  --input_path data/raw/S1A_...zip \
  --output_path data/processed/S1A_processed.tif \
  --graph_xml path/to/graph.xml

2. Training a Model

Random Forest:

poetry run python -m src.flood_detector.train \
  --model_type random_forest \
  --training_data data/training_data/my_rf_samples.csv \
  --model_output_path results/models/rf_model_v1.joblib

U-Net:

poetry run python -m src.flood_detector.train \
  --model_type unet \
  --training_data data/training_data/Sen1Floods11/ \
  --model_output_path results/models/unet_model_v1.h5 \
  --epochs 10 \
  --batch_size 1

Note: Includes sample inference per epoch, saved to results/training_logs/.

3. Running Inference

poetry run python -m src.flood_detector.predict \
  --model_path results/models/unet_model_v1.h5 \
  --input_image data/processed/post_flood_event.tif \
  --output_map results/maps/flood_map_event.tif

poetry run python -m src.flood_detector.predict --model_path results/models/unet_final_test_best.h5 --post_flood_image data/training_data/Sen1Floods11/train/s1_post/Bolivia_76104_S1Hand.tif --output_map results/maps/flood_map_sample.tif

Evaluation

Metrics address flood data imbalance:

IoU: Overlap between predicted and actual flood areas
F1-Score: Balances precision and recall
Precision: Accuracy of flood predictions
Recall: Detection of all flooded areas

Contributing

Contributions are encouraged! Fork, branch, and submit a pull request.

License

MIT License. See LICENSE for details.

Acknowledgments

ESA Copernicus Programme
NASA/USGS Landsat Program
Cloud to Street (Sen1Floods11)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
src/flood_detector		src/flood_detector
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Satellite-Based Flood Detection Pipeline

Using satellite imagery for detecting and mapping floodwater

Table of Contents

Project Description

Features

Project Structure

Methodology

Path A: Rapid Response / Proof-of-Concept

Path B: High-Accuracy / Research-Grade

Data Sources

Two-Phase Data Strategy

Data Requirements

Sources

Installation

Dataset Acquisition Guide

Part 1: Benchmark Datasets

Sen1Floods11 (SAR)

WorldFloods (Optical)

Part 2: Event Imagery (e.g., Limburg Floods, July 2021)

Usage

1. Preprocessing

2. Training a Model

3. Running Inference

Evaluation

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

soheil-mp/flood-detection-pipeline

Folders and files

Latest commit

History

Repository files navigation

Satellite-Based Flood Detection Pipeline

Using satellite imagery for detecting and mapping floodwater

Table of Contents

Project Description

Features

Project Structure

Methodology

Path A: Rapid Response / Proof-of-Concept

Path B: High-Accuracy / Research-Grade

Data Sources

Two-Phase Data Strategy

Data Requirements

Sources

Installation

Dataset Acquisition Guide

Part 1: Benchmark Datasets

Sen1Floods11 (SAR)

WorldFloods (Optical)

Part 2: Event Imagery (e.g., Limburg Floods, July 2021)

Usage

1. Preprocessing

2. Training a Model

3. Running Inference

Evaluation

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages