RemoteSAM: Towards Segment Anything for Earth Observation

* Equal Contribution ✉ Corresponding Author

News

2025/7/5: Our paper "RemoteSAM: Towards Segment Anything for Earth Observation" is accepted by ACM Multimedia 2025 (oral presentation)!
2025/5/7: We have released the model and dataset! You can download RemoteSAM-270K from 🤗RemoteSAM-270K and checkpoint from 🤗RemoteSAM.
2025/5/3: Welcome to RemoteSAM! The preprint of our paper is available. Dataset and model are open-sourced at this repository.

Introduction

Welcome to the official repository of our paper "RemoteSAM: Towards Segment Anything for Earth Observation"!

Recent advances in AI have revolutionized Earth observation, yet most remote sensing tasks still rely on specialized models with fragmented interfaces. To address this, we present RemoteSAM, a vision foundation model that unifies pixel-, region-, and image-level tasks through a novel architecture centered on Referring Expression Segmentation (RES). Unlike existing paradigms—task-specific heads with limited knowledge sharing or text-based models struggling with dense outputs—RemoteSAM leverages pixel-level predictions as atomic units, enabling upward compatibility to higher-level tasks while eliminating computationally heavy language model backbones. This design achieves an order-of-magnitude parameter reduction (billions to millions), enabling efficient high-resolution data processing.

We also build RemoteSAM-270K dataset, a large-scale collection of 270K Image-Text-Mask triplets generated via an automated pipeline powered by vision-language models (VLMs). This dataset surpasses existing resources in semantic diversity, covering 1,000+ object categories and rich attributes (e.g., color, spatial relations) through linguistically varied prompts. We further introduce RSVocab-1K, a hierarchical semantic vocabulary to quantify dataset coverage and adaptability.

Setting Up

The code has been verified to work with PyTorch v1.13.0 and Python 3.8.

Clone this repository.
Change directory to root of this repository.

Package Dependencies

Create a new Conda environment with Python 3.8 then activate it:

conda create -n RemoteSAM python==3.8
conda activate RemoteSAM

Install PyTorch v1.13.0 with a CUDA version that works on your cluster/machine (CUDA 11.6 is used in this example):

pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116

Install mmcv from openmmlab:

pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13.0/index.html

Install the packages in requirements.txt via pip:

pip install -r requirements.txt

The Initialization Weights for Training

Create the ./pretrained_weights directory where we will be storing the weights.

mkdir ./pretrained_weights

Download pre-trained classification weights of the Swin Transformer, and put the pth file in ./pretrained_weights. These weights are needed for training to initialize the model.

Data Preparation

We perform all experiments based on our proposed dataset RemoteSAM-270K.

Usage

Download our dataset from HuggingFace.
Copy all the downloaded files to ./refer/data/. The dataset folder should be like this:

$DATA_PATH
├── RemoteSAM-270K
│   ├── JPEGImages
│   ├── Annotations
└────  ├── refs(unc).p
       ├── instances.json

RemoteSAM

Training

We use DistributedDataParallel from PyTorch for training. To run on 8 GPUs on a single node: More training setting can be change in args.py.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
      python -m torch.distributed.launch \
      --nproc_per_node 8 --master_port 12345 train.py \
      --epochs 40 --img_size 896 2>&1 | tee ./output

Getting Started

To get started with RemoteSAM, please first initialize a model and load the RemoteSAM checkpoint with a few lines of code:

from tasks.code.model import RemoteSAM, init_demo_model
import cv2
import numpy as np

device = 'cuda:0'
checkpoint = "./pretrained_weights/checkpoint.pth"

model = init_demo_model(checkpoint, device)
model = RemoteSAM(model, device, use_EPOC=True)

Then, you can explore different tasks with RemoteSAM via:

Referring Expression Segmentation

image = cv2.imread("./assets/demo.jpg")
mask = model.referring_seg(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), sentence="the airplane on the right")

Semantic Segmentation

image = cv2.imread("./assets/demo.jpg")
result = model.semantic_seg(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'])
for classname in ["airplane", "vehicle"]:
    mask = result[classname]

Object Detection

image = cv2.imread("./assets/demo.jpg")
result = model.detection(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'])
for classname in ["airplane", "vehicle"]:
    boxes = result[classname]

Visual Grounding

image = cv2.imread("./assets/demo.jpg")
box = model.visual_grounding(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), sentence="the airplane on the right")

Multi-label classification

image = cv2.imread("./assets/demo.jpg")
result = model.multi_label_cls(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'])
print(result)

Image Classification

image = cv2.imread("./assets/demo.jpg")
result = model.multi_class_cls(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'])
print(result)

Image Captioning

image = cv2.imread("./assets/demo.jpg")
result = model.captioning(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'], region_split=9)
print(result)

Object Counting

image = cv2.imread("./assets/demo.jpg")
result = model.counting(image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB), classnames=['airplane', 'vehicle'])
for classname in ["airplane", "vehicle"]:
    print("{}: {}".format(classname, result[classname]))

Evaluation

Evaluation of Referring Expression Segmentation

bash tasks/REF.sh

Evaluation of Semantic Segmentation

bash tasks/SEG.sh

Evaluation of Object Detection

bash tasks/DET.sh

Evaluation of Visual Grounding

bash tasks/VG.sh

Evaluation of Multi-label classification

bash tasks/MLC.sh

Evaluation of Image classification

bash tasks/MCC.sh

Evaluation of Image Captioning

bash tasks/CAP.sh

Evaluation of Object Counting

bash tasks/CNT.sh

Acknowledge

Thanks Lu Wang (王璐) for his efforts on the RemoteSAM-270K dataset.
Code in this repository is built on RMSIN. We'd like to thank the authors for open sourcing their project.

Contact

Please Contact yaoliang@hhu.edu.cn

Cite

If you find this work useful, please cite our paper as:

@misc{yao2025RemoteSAM,
      title={RemoteSAM: Towards Segment Anything for Earth Observation}, 
      author={Liang Yao and Fan Liu and Delong Chen and Chuanyi Zhang and Yijun Wang and Ziyun Chen and Wei Xu and Shimin Di and Yuhui Zheng},
      year={2025},
      eprint={2505.18022},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.18022}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
arc		arc
assets		assets
bert		bert
data		data
lib		lib
loss		loss
refer		refer
tasks		tasks
README.md		README.md
RemoteSAM.pdf		RemoteSAM.pdf
args.py		args.py
requirements.txt		requirements.txt
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RemoteSAM: Towards Segment Anything for Earth Observation

News

Introduction

Setting Up

Package Dependencies

The Initialization Weights for Training

Data Preparation

Usage

RemoteSAM

Training

Getting Started

Evaluation

Acknowledge

Contact

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

1e12Leon/RemoteSAM

Folders and files

Latest commit

History

Repository files navigation

RemoteSAM: Towards Segment Anything for Earth Observation

News

Introduction

Setting Up

Package Dependencies

The Initialization Weights for Training

Data Preparation

Usage

RemoteSAM

Training

Getting Started

Evaluation

Acknowledge

Contact

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages