Skip to content

cvlab-stonybrook/PromptAdaptOVD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Low-Rank Prompt Adaptation for Open-Vocabulary Object Detection

Official implementation of:

📄 Low-Rank Prompt Adaptation for Open-Vocabulary Object Detection

👨‍💻 Zekun Zhang*, Vu Quang Truong*, Minh Hoai (*equal contribution)

🎯 Accepted at ICCV 2025 MMFM Workshop


🔍 Overview

Method Overview

Method overview.

We propose a low-rank prompt enhancer module to adapt open-vocabulary object detectors (OVDs) like GroundingDINO without changing their backbone or head. This enhancer is:

  • Lightweight and parameter-efficient
  • Learns to improve prompts using few labeled images
  • Integrates easily into Grounded SAM 2 for unseen object instance segmentation (UOIS)

🌟 Highlights

  • ✅ Improves GroundingDINO across multiple OVD datasets
  • ✅ Outperforms LoRA, LoSA, BitFit, Prompt Tuning, Res-Tuning and full fine-tuning
  • ✅ Enables Grounded SAM 2 to achieve SOTA on UOIS with only 50 box-labeled images

📦 Installation


📁 Data Preparation

You will need to manually download all datasets, extract, and place them at the same directory level as this repository. The expected structure looks like this:

root_dir/
├── PromptAdaptOVD/
├── EgoPER/
├── MSCOCO2017/
├── RarePlanes/
├── PTG/            # EgoPER
├── OIH_VIS/        # HOIST
├── odinw_13/
├── OCID/
├── HouseCat6D/
└── ...

🖼 Scenes100 Exception

This repository only uses the annotated subset of Scenes100. You must ensure that the folder:

PromptAdaptOVD/images/annotated/

contains all the annotated images and their metadata. If this folder is missing, Scenes100 experiments will not run correctly.


📥 Model Weights

You can download our pretrained enhancer weights here:

➡️ Download Model Weights (Hugging Face)

Place all of the content in the extracted weights folder in the folder PromptAdaptOVD/scripts/groundingdino_baseline.


🚀 Getting Started

🔧 Training

cd scripts/groundingdino_baseline
bash train_enhancer.sh rank type

🧪 Evaluation

cd scripts/groundingdino_baseline
bash eval_enhancer.sh rank type

With rank is the rank of the enhancer and type is the feature attention method, which can be both, image or text. Please check the scripts/groundingdino_baseline folder for the scripts of other methods (e.g., LoRA, LoSA, Res-Tuning).


🧠 Results

📦 Open-Vocabulary Detection (APm)

Method Params % Scenes100 EgoPER HOIST OV-COCO RarePlanes Avg.
Base Model 0% 30.84 24.83 17.47 19.97 41.54 26.04
Res-Tuning 0.06% 48.59 68.05 39.61 38.04 57.36 50.33
BitFit 0.06% 55.55 67.00 37.37 45.00 49.09 50.80
LoRA 0.68% 55.74 67.36 37.66 44.76 52.02 51.51
Ours (r=16) 0.04% 56.16 68.05 38.69 42.61 52.92 51.68

👉 Our enhancer outperforms all parameter-efficient baselines in the average $AP^m$ with fewer parameters.


🔐 UOIS Segmentation Results

🥣 OCID Dataset (RGB-only Input)

Method Training Images Overlap F Boundary F % ≥ 75
UCN 280,000 59.4 36.5 48.0
UOAIS-Net 53,450 67.9 62.3 73.1
MSMFormer 53,450 70.5 64.9 75.3
MSMFormer + Refinement 53,450 66.3 54.8 52.8
UOIS-SAM 5,345 79.9 72.5 78.3
Ours (r=16) 50 77.2 73.7 74.0

🏠 HouseCat6D Dataset

Method Input Training Images Overlap F Boundary F % ≥ 75
UCN RGB 280,000 45.0 22.5 48.4
UOAIS-Net RGB 53,450 60.3 52.8 81.2
MSMFormer RGB 53,450 67.3 57.6 80.4
MSMFormer + Refinement RGB 53,450 66.7 54.9 71.3
UOIS-SAM RGB 5,345 70.0 66.2 84.8
Ours (r=16) RGB 50 82.7 78.9 89.7

📌 All methods above use RGB-only input. Our approach uses only 50 images with box annotations yet achieves competitive performance, including those trained on thousands of images with mask annotations.


📘 Citation

If you find our work useful, please cite:

@inproceedings{zhang2025lowrank,
  title     = {Low-Rank Prompt Adaptation for Open-Vocabulary Object Detection},
  author    = {Zekun Zhang and Vu Quang Truong and Minh Hoai},
  booktitle = {ICCV Workshop on Multi-modal Foundation Models (MMFM)},
  year      = {2025}
}

🤝 Contact

About

Official repo of Low-Rank Prompt Adaptation for Open-Vocabulary Object Detection [ICCV 2025 Workshop]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published