TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer , Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan

Mohamed bin Zayed University of AI, University College London, Aalto University, Linköping University, Australian National University

📢 Latest Updates

Jun-09-25: 🚀 Initial release of TerraFM codebase and pretrained models
Jun-09-25: 📄 Paper released on arXiv: arxiv link. 🔥🔥

🌍 Overview

TerraFM is a scalable foundation model designed for unified processing of multisensor Earth Observation (EO) data. Built on a ViT backbone and trained over 18.7M tiles (~23T pixels) from Sentinel-1 SAR and Sentinel-2 optical imagery, TerraFM unifies modality-specific inputs using:

🧩 Modality-specific patch embeddings
🌀 Adaptive cross-attention fusion
🎯 Dual-centering regularization for long-tailed distributions

TerraFM sets a new benchmark on GEO-Bench and Copernicus-Bench, demonstrating strong generalization across geographies, modalities, and tasks — including classification, segmentation, and landslide detection.

🔬 Key Features

Multimodal Pretraining: Uses Sentinel-1 (SAR) and Sentinel-2 (L1C, L2A) as natural augmentations.
Large-Scale Dataset: Trained on 18.7M global tiles from the Major-TOM dataset.
Cross-Attention Fusion: Dynamically aggregates information across sensors at patch level.
Dual-Centering: Mitigates long-tailed land cover bias using ESA WorldCover statistics.
Benchmark SOTA: Outperforms prior FMs (Galileo, Prithvi, DOFA) across multiple EO tasks.

🧱 Architecture

Overall architecture of TerraFM. It unifies student-teacher contrastive framework with modality augmentation with cross-attention fusion, and a new dual centering regularization. TerraFM is founded on ViT backbone and is trained on 18.7M globally distributed samples for pre-training and utilizes large-tile inputs for encoding broader spatial context. For illustration, RGB channels from S2-L2A and S2-L1C are selected, and S1 is visualized using a false-color RGB composite.

🧠 Model Zoo

Model	Modality	Input Size	Backbone	Link
TerraFM-B	Sentinel-1 RTC + Sentinel-2 Level 2A + Sentinel-2 Level 1C	224×224	ViT-Base	Download
TerraFM-L	Sentinel-1 RTC + Sentinel-2 Level 2A + Sentinel-2 Level 1C	224×224	ViT-Large	Download

🛠 Usage

TerraFM can be used directly via the terrafm.py module, which provides standalone implementations of the TerraFM-Base and TerraFM-Large models for easy integration into any codebase.

from terrafm import terrafm_base, terrafm_large
import torch

# Simulated input: 1 sample, 12 channels, 224×224 resolution (e.g., Sentinel-2 L2A)
x = torch.randn(1, 12, 224, 224)

# Load TerraFM-Base model
model = terrafm_base()

# Load pretrained weights (e.g., TerraFM-B.pth)
state_dict = torch.load("TerraFM-B.pth", map_location="cpu")
msg = model.load_state_dict(state_dict, strict=False)

# Forward pass
y = model(x)
print(f"Output shape: {y.shape}")

📊 Results

🔍 k-NN Classification Results

We evaluate image classification using k-nearest neighbors (kNN) and report Top-1 accuracy for all single-label tasks. For the multilabel BigEarthNet benchmark, we report the F1 score.

Model	Backbone	m-EuroSat (100%)	m-EuroSat (1%)	m-BigEarthNet (100%)	m-BigEarthNet (1%)	m-So2Sat (100%)	m-So2Sat (1%)	m-Brick-Kiln (100%)	m-Brick-Kiln (1%)
SatMAE	ViT-Base	84.1	34.8	50.6	29.0	36.0	23.1	86.1	73.5
SatMAE++	ViT-Large	82.7	48.5	50.8	31.6	34.7	23.4	89.6	76.7
CROMA	ViT-Base	85.6	51.3	58.8	44.7	48.8	33.8	92.6	85.1
SoftCon	ViT-Small	89.8	27.2	64.7	43.3	51.1	31.4	89.2	77.8
DOFA	ViT-Base	82.8	49.6	49.4	29.9	41.4	29.4	88.3	78.3
Satlas	Swin-Tiny	81.7	35.8	51.9	29.6	36.6	27.1	88.2	73.0
MMEarth	CNN-atto	81.7	30.0	58.3	39.6	39.8	25.1	89.4	79.7
DeCUR	ViT-Small	89.0	46.6	63.8	49.6	45.8	30.9	83.7	74.2
AnySat	ViT-Base	82.2	47.1	54.9	33.7	39.8	29.0	85.3	72.0
Galileo	ViT-Base	93.0	56.6	59.0	36.5	54.8	43.2	90.7	78.0
Prithvi-2.0	ViT-Large	80.2	48.0	49.4	28.8	29.5	26.1	87.9	80.6
Copernicus-FM	ViT-Base	76.0	47.4	53.8	33.3	38.4	23.3	93.0	83.2
TerraFM	ViT-Base	94.2	59.3	68.7	49.4	55.1	41.6	94.5	85.6
TerraFM	ViT-Large	95.1	62.1	69.4	50.6	55.9	41.1	93.0	82.2

🛰 Copernicus-Bench

Comparison of TerraFM with existing supervised and self-supervised methods on Copernicus-Bench.
Metrics include OA (Overall Accuracy), mAP (mean Average Precision), and mIoU (mean Intersection over Union).

Dataset	Metric	Supervised	Random	SoftCon	CROMA	DOFA	Copernicus-FM	TerraFM
Backbone	--	ViT-B/16	ViT-B/16	ViT-B/14	ViT-B/8	ViT-B/16	ViT-B/16	ViT-B/16
Cloud-S2	mIoU	59.4	60.4	66.9	65.0	65.0	66.7	67.9
EuroSAT-S1	OA	81.5	75.4	83.6	83.9	81.7	87.2	87.8
EuroSAT-S2	OA	97.6	92.5	96.7	97.0	97.2	97.9	99.1
BigEarthNet-S1	mAP	70.6	63.8	78.7	70.8	70.5	77.9	76.9
BigEarthNet-S2	mAP	80.1	71.6	83.6	76.4	75.5	79.0	84.4
DFC2020-S1	mIoU	50.8	45.4	52.8	52.7	49.7	52.4	55.4
DFC2020-S2	mIoU	66.2	62.3	64.1	66.5	61.8	64.5	63.8
LCZ-S2	OA	85.3	77.4	83.6	84.1	83.0	84.4	87.0

🧪 GEO-Bench Performance

Performance comparison on GEO-Bench for both classification (Top-1 Accuracy), segmentation (mIoU), and F1 score (for m-BigEarthNet).
TerraFM achieves state-of-the-art results across multiple datasets, outperforming previous foundation models.

Method	Backbone	m-EuroSat	m-BigEarthNet	m-So2Sat	m-Brick-Kiln	m-Cashew-Plant	m-SA-Crop-Type
SatMAE	ViT-Large	96.6	68.3	57.2	98.4	30.8	24.8
SatMAE++	ViT-Large	96.5	67.9	56.0	98.6	29.6	25.7
CROMA	ViT-Large	96.6	71.9	60.6	98.7	31.8	32.0
SoftCon	ViT-Base	97.5	70.3	61.7	98.7	29.6	30.8
DOFA	ViT-Large	96.9	68.0	58.7	98.6	27.7	25.4
Satlas	Swin-Base	97.5	72.8	61.9	98.9	25.1	23.4
MMEarth	CNN-atto	95.7	70.0	57.2	98.9	24.2	22.2
DeCUR	ViT-Small	97.9	70.9	61.7	98.7	26.2	21.5
Prithvi 2.0	ViT-Large	96.5	69.0	54.6	98.6	26.7	22.9
AnySat	ViT-Base	95.9	70.3	51.8	98.6	26.1	27.1
Galileo	ViT-Base	97.7	70.7	63.3	98.7	33.0	30.1
TerraFM	ViT-Base	98.1	72.6	64.9	98.7	34.1	33.0
TerraFM	ViT-Large	98.6	73.1	66.6	99.0	37.2	34.5

🌋 Landslide Detection (Landslide4Sense)

Landslide detection performance on the Landslide4Sense test set.
Despite having significantly fewer parameters (120M vs. 300M), TerraFM achieves higher overall segmentation performance, especially for landslide regions.

Model	mIoU	IoU (Landslide)
Prithvi-EO-2.0 (300M)	65.0	31.5
TerraFM (120M)	70.8	43.1

---

📜 Citation

If you find our work and this repository useful, please consider giving our repo a star and citing our paper as follows:

@article{danish2025terrafmscalablefoundationmodel,
      title={TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation}, 
      author={Muhammad Sohail Danish and Muhammad Akhtar Munir and Syed Roshaan Ali Shah and Muhammad Haris Khan and Rao Muhammad Anwer and Jorma Laaksonen and Fahad Shahbaz Khan and Salman Khan},
      year={2025},
      eprint={2506.06281},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.06281}, 
}

📨 Contact

If you have any questions, please create an issue on this repository or contact at muhammad.sohail@mbzuai.ac.ae.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
README.md		README.md
terrafm.py		terrafm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer , Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan

Mohamed bin Zayed University of AI, University College London, Aalto University, Linköping University, Australian National University

📢 Latest Updates

🌍 Overview

🔬 Key Features

🧱 Architecture

🧠 Model Zoo

🛠 Usage

📊 Results

🔍 k-NN Classification Results

🛰 Copernicus-Bench

🧪 GEO-Bench Performance

🌋 Landslide Detection (Landslide4Sense)

📜 Citation

📨 Contact

About

Uh oh!

Releases

Packages

Languages

mbzuai-oryx/TerraFM

Folders and files

Latest commit

History

Repository files navigation

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer , Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan

Mohamed bin Zayed University of AI, University College London, Aalto University, Linköping University, Australian National University

📢 Latest Updates

🌍 Overview

🔬 Key Features

🧱 Architecture

🧠 Model Zoo

🛠 Usage

📊 Results

🔍 k-NN Classification Results

🛰 Copernicus-Bench

🧪 GEO-Bench Performance

🌋 Landslide Detection (Landslide4Sense)

📜 Citation

📨 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages