Embracing Collaboration Over Competition:
Condensing Multiple Prompts for Visual In-Context Learning

1.Introduction

This repository contains the PyTorch implementation of our work at CVPR 2025:

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning. Jinpeng Wang^*, Tianci Luo^*, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia.

We devise CONDENSER, a lightweight external plugin that compresses relevant fine-grained context across multiple prompts. Optimized end-to-end with the backbone and an extra pre-alignment objective, CONDENSER ensures stability and accurate integration of contextual cues.

In the following, we will guide you how to use this repository step by step. 🤗

2.Preparation

git clone https://github.com/gimpong/CVPR25-Condenser.git
cd CVPR25-Condenser

2.1 Environment Setup

conda create -n condenser python=3.8 -y
conda activate condenser
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

2.2 Download the image datasets and organize them properly

Download the Pascal-5i Dataset, Pascal VOC 2012 Dataset, Imagenet Dataset, MSCOCO Dataset.

The working directory is expected to be organized as below:

CVPR25-Condenser/

Codes/ Store all code files in Codes/

.../

Data/

coco/

Coco_Trainlabel
Coco_Vallabel
trn2014
val2014

imagenet/

test_data
test_label
train_data
train_label

output

logs/
visual_examples/

pascal-5i/
save_ours_ckpt/
ckpt/
splits/
weights

vqgan/

last.ckpt
model.yaml

checkpoint-1000.pth

Please from the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.

We will use Foreground Segmentation as an example to illustrate the workflow of the code.

3 Preprocess

3.1 Prompt Retriever

We select the best samples through feature space retrieval.

First, we extract features at the pixel-level using CLIP's visual encoder, separately for the val-set and train-set.

python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level val
python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level trn

Then, we calculate a similarity matrix using the features, and extract the top-50 similar prompt names.

python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level val trn
python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level trn trn

3.2 Preprocessing Features

We aim to preprocess the features so that they can be directly used as embeddings for visual prompts and queries.

python Codes/tools/calculate_pre_feature_for_query.py
python Codes/tools/calculate_pre_feature_for_support.py

4. Training and Inference

4.1 Training

python3 Codes/train_vp_segmentation.py \
 --mode spimg_spmask \
 --output_dir Data/output/logs/ \
 --device cuda:0 \
 --base_dir Data/pascal-5i/ \
 --batch-size 16 \
 --lr 0.03 \
 --epoch 150 \
 --scheduler cosinewarm \
 --arr a1 \
 --vp-model Prompt \
 --p-eps 1 \
 --ckpt Data/weights/checkpoint-1000.pth \
 --vq_ckpt_dir Data/weights/vqgan \
 --save_base_dir Data/ \
 --simidx 16 \
 --dropout 0.25 \
 --choice Zero \
 --loss_mean 1 \
 --align_q 0 \
 --fold 3

<fold>: fold-id of pascal-5i and coco-5i
<simidx>: number of prompt pairs

Replace train_vp_segmentation.py with train_vp_detection.py to train for single object detection.
Replace train_vp_segmentation.py with train_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to train for coloring.
Change the value of simidx to determine the number of prompt pairs used during training.

The logs, model checkpoints will be generated under the Data/output/logs/ and Data/save_ours_ckpt/ folders, respectively.

4.2 Inference

We provide the evaluation code for model checkpoints (if exist). The test command is as follows:

python3 Codes/val_vp_segmentation.py \
 --fold 1\
 --mode spimg_spmask\
 --output_dir Data/output/logs/\ 
 --device cuda:0\ 
 --base_dir Data/pascal-5i/\ 
 --batch-size 8\
 --lr 0.03\
 --epoch 150\
 --arr a1\
 --vp-model Prompt\
 --p-eps 1\
 --ckpt Data/weights/checkpoint-1000.pth\
 --vq_ckpt_dir Data/weights/vqgan\
 --save_base_dir Data/\
 --simidx 1\
 --dropout 0.25\
 --align_q 0 \
 --save_model_path SAVE_MODEL_PATH

Replace val_vp_segmentation.py with val_vp_detection.py to inference for single object detection.
Replace val_vp_segmentation.py with val_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to inference for coloring.
Change the value of simidx to determine the number of prompt pairs used during inference.

To facilitate the readers' implementation of inference, we have also designed a simple bash script for inference. To run it, navigate to the root directory of CVPR25_Condenser and execute the following command:

bash Codes/script/run01.sh

This will complete the relevant inference tasks.

5. Results

Download the checkpoint to the Data/ckpt path. Run the corresponding .sh file to achieve one-click execution and directly obtain the results shown in the table.

Task (Metric)	Dataset		K=1				K=16
Task (Metric)	Dataset		Performance	Log	Checkpoint	Script	Performance	Log	Checkpoint	Script
Segmentation (mIoU↑)	Pascal-5i	Folder 0	42.13	Seg_K_1_Fold_0_Log	Seg_K_1_Fold_0.pth	run01.sh	45.53	Seg_K_16_Fold_0_Log	Seg_K_16_Fold_0.pth	run02.sh
		Folder 1	50.31	Seg_K_1_Fold_1_Log	Seg_K_1_Fold_1.pth	run03.sh	52.06	Seg_K_16_Fold_1_Log	Seg_K_16_Fold_1.pth	run04.sh
		Folder 2	42.20	Seg_K_1_Fold_2_Log	Seg_K_1_Fold_2.pth	run05.sh	44.33	Seg_K_16_Fold_2_Log	Seg_K_16_Fold_2.pth	run06.sh
		Folder 3	41.90	Seg_K_1_Fold_3_Log	Seg_K_1_Fold_3.pth	run07.sh	44.58	Seg_K_16_Fold_3_Log	Seg_K_16_Fold_3.pth	run08.sh
Detection (mIoU↑)	Pascal VOC 2012		43.22	Det_K_1_Log	Det_K_1.pth	run09.sh	44.64	Det_K_16_Log	Det_K_16.pth	run10.sh
Coloring (MSE↓)	ImageNet-1K		0.56	Col_K_1_Log	Col_K_1.pth	run11.sh	0.54	Col_K_16_Log	Col_K_16.pth	run12.sh

We have also open-sourced the experiment logs and checkpoints for the domain adaptation experiments. The experiments were pre-trained on Coco-5i and tested on Pascal-5i.

Task (Metric)	Dataset		K=1				K=16
Task (Metric)	Dataset		Performance	Log	Checkpoint	Script	Performance	Log	Checkpoint	Script
Segmentation (mIoU↑)	Coco-5i	Folder 0	40.39	Coco_K_1_Fold_0_Log	Coco_K_1_Fold_0.pth	run13.sh	40.37	Coco_K_16_Fold_0_Log	Coco_K_16_Fold_0.pth	run14.sh
		Folder 1	44.54	Coco_K_1_Fold_1_Log	Coco_K_1_Fold_1.pth	run15.sh	44.85	Coco_K_16_Fold_1_Log	Coco_K_16_Fold_1.pth	run16.sh
		Folder 2	40.23	Coco_K_1_Fold_2_Log	Coco_K_1_Fold_2.pth	run17.sh	41.03	Coco_K_16_Fold_2_Log	Coco_K_16_Fold_2.pth	run18.sh
		Folder 3	36.33	Coco_K_1_Fold_3_Log	Coco_K_1_Fold_3.pth	run19.sh	35.84	Coco_K_16_Fold_3_Log	Coco_K_16_Fold_3.pth	run20.sh

6. Visual Examples

We have provided visualized results for some test cases to help readers with an intuitive understanding.

7. References

If you find our code useful or use the toolkit in your work, please consider citing:

@inproceedings{Wang25_Condenser,
  author={Wang, Jinpeng and Luo, Tianci and Zha, Yaohua and Feng, Yan and Luo, Ruisheng and Chen, Bin and Dai, Tao and Chen, Long and Wang, Yaowei and Xia, Shu-Tao},
  title={Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

8. Acknowledgments

This code is based on our previous work InMeMo.

We are also grateful for other teams for open-sourcing codes that inspire our work, including Visual Prompting, visual_prompt_retrieval, timm, ILM-VP.

9. Contact

If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
Codes		Codes
Data		Data
Figure		Figure
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embracing Collaboration Over Competition:
Condensing Multiple Prompts for Visual In-Context Learning

1.Introduction

2.Preparation

2.1 Environment Setup

2.2 Download the image datasets and organize them properly

3 Preprocess

3.1 Prompt Retriever

3.2 Preprocessing Features

4. Training and Inference

4.1 Training

4.2 Inference

5. Results

6. Visual Examples

7. References

8. Acknowledgments

9. Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gimpong/CVPR25-Condenser

Folders and files

Latest commit

History

Repository files navigation

Embracing Collaboration Over Competition:Condensing Multiple Prompts for Visual In-Context Learning

1.Introduction

2.Preparation

2.1 Environment Setup

2.2 Download the image datasets and organize them properly

3 Preprocess

3.1 Prompt Retriever

3.2 Preprocessing Features

4. Training and Inference

4.1 Training

4.2 Inference

5. Results

6. Visual Examples

7. References

8. Acknowledgments

9. Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Embracing Collaboration Over Competition:
Condensing Multiple Prompts for Visual In-Context Learning

Packages