Skip to content

The code for the paper "Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning" (CVPR'25).

Notifications You must be signed in to change notification settings

gimpong/CVPR25-Condenser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Embracing Collaboration Over Competition:
Condensing Multiple Prompts for Visual In-Context Learning

1.Introduction

This repository contains the PyTorch implementation of our work at CVPR 2025:

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning. Jinpeng Wang*, Tianci Luo*, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia.

main We devise CONDENSER, a lightweight external plugin that compresses relevant fine-grained context across multiple prompts. Optimized end-to-end with the backbone and an extra pre-alignment objective, CONDENSER ensures stability and accurate integration of contextual cues.

In the following, we will guide you how to use this repository step by step. 🤗

2.Preparation

git clone https://github.com/gimpong/CVPR25-Condenser.git
cd CVPR25-Condenser

2.1 Environment Setup

conda create -n condenser python=3.8 -y
conda activate condenser
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

2.2 Download the image datasets and organize them properly

Download the Pascal-5i Dataset, Pascal VOC 2012 Dataset, Imagenet Dataset, MSCOCO Dataset.

The working directory is expected to be organized as below:

CVPR25-Condenser/
  • Codes/ Store all code files in Codes/
    • .../
  • Data/
    • coco/
      • Coco_Trainlabel
      • Coco_Vallabel
      • trn2014
      • val2014
    • imagenet/
      • test_data
      • test_label
      • train_data
      • train_label
    • output
      • logs/
      • visual_examples/
    • pascal-5i/
    • save_ours_ckpt/
    • ckpt/
    • splits/
    • weights
      • vqgan/
        • last.ckpt
        • model.yaml
      • checkpoint-1000.pth

Please from the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.

We will use Foreground Segmentation as an example to illustrate the workflow of the code.

3 Preprocess

3.1 Prompt Retriever

We select the best samples through feature space retrieval.

First, we extract features at the pixel-level using CLIP's visual encoder, separately for the val-set and train-set.

python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level val
python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level trn

Then, we calculate a similarity matrix using the features, and extract the top-50 similar prompt names.

python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level val trn
python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level trn trn

3.2 Preprocessing Features

We aim to preprocess the features so that they can be directly used as embeddings for visual prompts and queries.

python Codes/tools/calculate_pre_feature_for_query.py
python Codes/tools/calculate_pre_feature_for_support.py

4. Training and Inference

4.1 Training

python3 Codes/train_vp_segmentation.py \
 --mode spimg_spmask \
 --output_dir Data/output/logs/ \
 --device cuda:0 \
 --base_dir Data/pascal-5i/ \
 --batch-size 16 \
 --lr 0.03 \
 --epoch 150 \
 --scheduler cosinewarm \
 --arr a1 \
 --vp-model Prompt \
 --p-eps 1 \
 --ckpt Data/weights/checkpoint-1000.pth \
 --vq_ckpt_dir Data/weights/vqgan \
 --save_base_dir Data/ \
 --simidx 16 \
 --dropout 0.25 \
 --choice Zero \
 --loss_mean 1 \
 --align_q 0 \
 --fold 3
  • <fold>: fold-id of pascal-5i and coco-5i
  • <simidx>: number of prompt pairs
  1. Replace train_vp_segmentation.py with train_vp_detection.py to train for single object detection.

  2. Replace train_vp_segmentation.py with train_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to train for coloring.

  3. Change the value of simidx to determine the number of prompt pairs used during training.

The logs, model checkpoints will be generated under the Data/output/logs/ and Data/save_ours_ckpt/ folders, respectively.

4.2 Inference

We provide the evaluation code for model checkpoints (if exist). The test command is as follows:

python3 Codes/val_vp_segmentation.py \
 --fold 1\
 --mode spimg_spmask\
 --output_dir Data/output/logs/\ 
 --device cuda:0\ 
 --base_dir Data/pascal-5i/\ 
 --batch-size 8\
 --lr 0.03\
 --epoch 150\
 --arr a1\
 --vp-model Prompt\
 --p-eps 1\
 --ckpt Data/weights/checkpoint-1000.pth\
 --vq_ckpt_dir Data/weights/vqgan\
 --save_base_dir Data/\
 --simidx 1\
 --dropout 0.25\
 --align_q 0 \
 --save_model_path SAVE_MODEL_PATH
  1. Replace val_vp_segmentation.py with val_vp_detection.py to inference for single object detection.

  2. Replace val_vp_segmentation.py with val_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to inference for coloring.

  3. Change the value of simidx to determine the number of prompt pairs used during inference.

To facilitate the readers' implementation of inference, we have also designed a simple bash script for inference. To run it, navigate to the root directory of CVPR25_Condenser and execute the following command:

bash Codes/script/run01.sh

This will complete the relevant inference tasks.

5. Results

Download the checkpoint to the Data/ckpt path. Run the corresponding .sh file to achieve one-click execution and directly obtain the results shown in the table.

Task (Metric) Dataset K=1 K=16
Performance Log Checkpoint Script Performance Log Checkpoint Script
Segmentation (mIoU↑) Pascal-5i Folder 0 42.13 Seg_K_1_Fold_0_Log Seg_K_1_Fold_0.pth run01.sh 45.53 Seg_K_16_Fold_0_Log Seg_K_16_Fold_0.pth run02.sh
Folder 1 50.31 Seg_K_1_Fold_1_Log Seg_K_1_Fold_1.pth run03.sh 52.06 Seg_K_16_Fold_1_Log Seg_K_16_Fold_1.pth run04.sh
Folder 2 42.20 Seg_K_1_Fold_2_Log Seg_K_1_Fold_2.pth run05.sh 44.33 Seg_K_16_Fold_2_Log Seg_K_16_Fold_2.pth run06.sh
Folder 3 41.90 Seg_K_1_Fold_3_Log Seg_K_1_Fold_3.pth run07.sh 44.58 Seg_K_16_Fold_3_Log Seg_K_16_Fold_3.pth run08.sh
Detection (mIoU↑) Pascal VOC 2012 43.22 Det_K_1_Log Det_K_1.pth run09.sh 44.64 Det_K_16_Log Det_K_16.pth run10.sh
Coloring (MSE↓) ImageNet-1K 0.56 Col_K_1_Log Col_K_1.pth run11.sh 0.54 Col_K_16_Log Col_K_16.pth run12.sh

We have also open-sourced the experiment logs and checkpoints for the domain adaptation experiments. The experiments were pre-trained on Coco-5i and tested on Pascal-5i.

Task (Metric) Dataset K=1 K=16
Performance Log Checkpoint Script Performance Log Checkpoint Script
Segmentation (mIoU↑) Coco-5i Folder 0 40.39 Coco_K_1_Fold_0_Log Coco_K_1_Fold_0.pth run13.sh 40.37 Coco_K_16_Fold_0_Log Coco_K_16_Fold_0.pth run14.sh
Folder 1 44.54 Coco_K_1_Fold_1_Log Coco_K_1_Fold_1.pth run15.sh 44.85 Coco_K_16_Fold_1_Log Coco_K_16_Fold_1.pth run16.sh
Folder 2 40.23 Coco_K_1_Fold_2_Log Coco_K_1_Fold_2.pth run17.sh 41.03 Coco_K_16_Fold_2_Log Coco_K_16_Fold_2.pth run18.sh
Folder 3 36.33 Coco_K_1_Fold_3_Log Coco_K_1_Fold_3.pth run19.sh 35.84 Coco_K_16_Fold_3_Log Coco_K_16_Fold_3.pth run20.sh

6. Visual Examples

We have provided visualized results for some test cases to help readers with an intuitive understanding.

Seg_Examples

7. References

If you find our code useful or use the toolkit in your work, please consider citing:

@inproceedings{Wang25_Condenser,
  author={Wang, Jinpeng and Luo, Tianci and Zha, Yaohua and Feng, Yan and Luo, Ruisheng and Chen, Bin and Dai, Tao and Chen, Long and Wang, Yaowei and Xia, Shu-Tao},
  title={Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

8. Acknowledgments

This code is based on our previous work InMeMo.

We are also grateful for other teams for open-sourcing codes that inspire our work, including Visual Prompting, visual_prompt_retrieval, timm, ILM-VP.

9. Contact

If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.

About

The code for the paper "Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning" (CVPR'25).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published