Skip to content

soheil-mp/flood-detection-pipeline

Repository files navigation

Satellite-Based Flood Detection Pipeline

License: MIT Python SAR Optical

Using satellite imagery for detecting and mapping floodwater

Flood Detection Pipeline

Table of Contents

Project Description

Floods rank among the most destructive natural disasters, impacting millions globally each year. Rapid and precise flood mapping is vital for effective disaster response and mitigation efforts. This project delivers a scalable, cutting-edge solution for flood detection by harnessing satellite imagery, empowering researchers, data scientists, and emergency responders with advanced tools.

The pipeline utilizes the all-weather reliability of Synthetic Aperture Radar (SAR) data from Sentinel-1, with optional integration of high-resolution optical data from Sentinel-2. It offers two distinct approaches:

  • Rapid Response: A lightweight, classical machine learning model (Random Forest) for swift deployment in emergencies.
  • High-Accuracy: A deep learning model (U-Net) for research-grade precision and operational excellence.

Features

  • End-to-End Workflow: Seamlessly handles data acquisition, preprocessing, training, evaluation, and inference.
  • Dual-Sensor Support: Primarily uses Sentinel-1 (SAR), with optional Sentinel-2 (Optical) fusion.
  • Multi-Modal Models: Combines classical ML (Random Forest) and deep learning (U-Net) for flexibility.
  • Standardized Preprocessing: Integrates the ESA SNAP Engine for consistent Analysis-Ready Data (ARD).
  • Benchmark Datasets: Compatible with Sen1Floods11 and WorldFloods for training and validation.
  • Robust Metrics: Evaluates performance with IoU and F1-Score, ideal for imbalanced flood data.

Project Structure

Organized as a Python package for streamlined setup and usage:

flood-detection-pipeline/
├── data/                    # Data storage (not tracked by Git)
│   ├── raw/                 # Raw satellite imagery
│   ├── processed/           # Preprocessed, analysis-ready data
│   └── training_data/       # Labeled datasets (e.g., Sen1Floods11)
│
├── notebooks/               # Exploratory Jupyter notebooks
│
├── results/                 # Outputs: models and flood maps
│   ├── models/塑造             # Trained model weights
│   └── maps/                # Generated flood maps (GeoTIFFs)
│
├── src/
│   └── flood_detector/      # Core Python package
│       ├── __init__.py
│       ├── config.py        # Configuration settings
│       ├── data_handler.py  # Data loading and prep
│       ├── evaluate.py      # Evaluation metrics
│       ├── models/          # Model definitions
│       │   ├── random_forest.py
│       │   └── unet.py
│       ├── preprocess.py    # Preprocessing scripts
│       ├── train.py         # Training logic
│       └── predict.py       # Inference scripts
│
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml           # Poetry dependency file

Methodology

The pipeline offers two complementary pathways tailored to different needs:

Feature Path A: Rapid Response Path B: High-Accuracy
Algorithm Random Forest U-Net
Data Sentinel-1 (pre/post) Sentinel-1 + Sentinel-2
Compute Needs Low High (GPU recommended)
Use Case Emergency response Research/operations

Path A: Rapid Response / Proof-of-Concept

  • Algorithm: Random Forest Classifier
  • Data: Sentinel-1 GRD (pre- and post-flood)
  • Features: VV/VH backscatter, change detection, DEM derivatives (slope, elevation)
  • Purpose: Fast deployment with minimal resources for urgent scenarios

Path B: High-Accuracy / Research-Grade

  • Algorithm: U-Net (or variants like Attention U-Net)
  • Data: Fused Sentinel-1 (SAR) and Sentinel-2 (Optical)
  • Features: Multi-channel tensor with SAR, optical bands, and HAND index
  • Purpose: Superior accuracy for research and operational systems

Data Sources

Two-Phase Data Strategy

  • Training Phase ("Textbook")

    • Datasets: Sen1Floods11, WorldFloods
    • Goal: Teach models flood patterns using labeled data
  • Prediction Phase ("Real-World")

    • Datasets: User-downloaded event imagery
    • Goal: Apply trained models to new flood events

Data Requirements

⚠️ Note: Input needs vary by model:

  • Random Forest (Path A): 2 images (pre- and post-flood)
  • U-Net (Path B): 1 image (post-flood only)

Sources

Installation

  1. Clone the Repository:

    git clone https://github.com/your-username/flood-detection-pipeline.git
    cd flood-detection-pipeline
  2. Install ESA SNAP: Download and install ESA SNAP and configure snappy per ESA’s instructions.

  3. Install Poetry:

    pip install poetry
  4. Install Dependencies:

    poetry install
  5. Activate Environment:

    poetry shell

Dataset Acquisition Guide

Part 1: Benchmark Datasets

Sen1Floods11 (SAR)

  • Size: ~14GB
  • Steps:
    mkdir data/training_data/Sen1Floods11
    gsutil -m rsync -r gs://sen1floods11 data/training_data/Sen1Floods11/
    Requires Google Cloud SDK.

WorldFloods (Optical)

  • Size: ~76GB
  • Steps:
    pip install huggingface_hub
    mkdir data/training_data/WorldFloods
    huggingface-cli download isp-uv-es/WorldFloodsv2 --repo-type dataset --local-dir data/training_data/WorldFloods

Part 2: Event Imagery (e.g., Limburg Floods, July 2021)

  • Location: Valkenburg aan de Geul, Netherlands (50.8661°N, 5.8308°E)
  • Source: Copernicus Dataspace
  • Steps:
    1. Register and log in.
    2. Search area and filter:
      • Sentinel-1: GRD, June 1–July 18, 2021
      • Sentinel-2: Level-2A, 0–10% cloud, same period
    3. Download and move to data/raw/.

Usage

Run scripts from the src/flood_detector/ package.

1. Preprocessing

poetry run python -m src.flood_detector.preprocess \
  --input_path data/raw/S1A_...zip \
  --output_path data/processed/S1A_processed.tif \
  --graph_xml path/to/graph.xml

2. Training a Model

  • Random Forest:

    poetry run python -m src.flood_detector.train \
      --model_type random_forest \
      --training_data data/training_data/my_rf_samples.csv \
      --model_output_path results/models/rf_model_v1.joblib
  • U-Net:

    poetry run python -m src.flood_detector.train \
      --model_type unet \
      --training_data data/training_data/Sen1Floods11/ \
      --model_output_path results/models/unet_model_v1.h5 \
      --epochs 10 \
      --batch_size 1

    Note: Includes sample inference per epoch, saved to results/training_logs/.

3. Running Inference

poetry run python -m src.flood_detector.predict \
  --model_path results/models/unet_model_v1.h5 \
  --input_image data/processed/post_flood_event.tif \
  --output_map results/maps/flood_map_event.tif

poetry run python -m src.flood_detector.predict --model_path results/models/unet_final_test_best.h5 --post_flood_image data/training_data/Sen1Floods11/train/s1_post/Bolivia_76104_S1Hand.tif --output_map results/maps/flood_map_sample.tif

Evaluation

Metrics address flood data imbalance:

  • IoU: Overlap between predicted and actual flood areas
  • F1-Score: Balances precision and recall
  • Precision: Accuracy of flood predictions
  • Recall: Detection of all flooded areas

Contributing

Contributions are encouraged! Fork, branch, and submit a pull request.

License

MIT License. See LICENSE for details.

Acknowledgments

  • ESA Copernicus Programme
  • NASA/USGS Landsat Program
  • Cloud to Street (Sen1Floods11)

About

Using satellite imagery to detect and map floodwater.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages