This repository contains the PyTorch implementation of our work at CVPR 2025:
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning. Jinpeng Wang*, Tianci Luo*, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia.
We devise CONDENSER, a lightweight external plugin that compresses relevant fine-grained context across multiple prompts. Optimized end-to-end with the backbone and an extra pre-alignment objective, CONDENSER ensures stability and accurate integration of contextual cues.
In the following, we will guide you how to use this repository step by step. 🤗
git clone https://github.com/gimpong/CVPR25-Condenser.git
cd CVPR25-Condenser
conda create -n condenser python=3.8 -y
conda activate condenser
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt
Download the Pascal-5i Dataset, Pascal VOC 2012 Dataset, Imagenet Dataset, MSCOCO Dataset.
The working directory is expected to be organized as below:
CVPR25-Condenser/
- Codes/ Store all code files in Codes/
- .../
- Data/
- coco/
- Coco_Trainlabel
- Coco_Vallabel
- trn2014
- val2014
- imagenet/
- test_data
- test_label
- train_data
- train_label
- output
- logs/
- visual_examples/
- pascal-5i/
- save_ours_ckpt/
- ckpt/
- splits/
- weights
- vqgan/
- last.ckpt
- model.yaml
- checkpoint-1000.pth
Please from the Visual Prompting to prepare the model and download the CVF 1000 epochs
pre-train checkpoint.
We will use Foreground Segmentation as an example to illustrate the workflow of the code.
We select the best samples through feature space retrieval.
First, we extract features at the pixel-level using CLIP's visual encoder, separately for the val-set and train-set.
python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level val
python Codes/tools/feature_extractor_folderwise_segmentation.py vit_large_patch14_clip_224.laion2b features_vit-laion2b_pixel-level trn
Then, we calculate a similarity matrix using the features, and extract the top-50 similar prompt names.
python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level val trn
python Codes/tools/calculate_similariity.py features_vit-laion2b_pixel-level trn trn
We aim to preprocess the features so that they can be directly used as embeddings for visual prompts and queries.
python Codes/tools/calculate_pre_feature_for_query.py
python Codes/tools/calculate_pre_feature_for_support.py
python3 Codes/train_vp_segmentation.py \
--mode spimg_spmask \
--output_dir Data/output/logs/ \
--device cuda:0 \
--base_dir Data/pascal-5i/ \
--batch-size 16 \
--lr 0.03 \
--epoch 150 \
--scheduler cosinewarm \
--arr a1 \
--vp-model Prompt \
--p-eps 1 \
--ckpt Data/weights/checkpoint-1000.pth \
--vq_ckpt_dir Data/weights/vqgan \
--save_base_dir Data/ \
--simidx 16 \
--dropout 0.25 \
--choice Zero \
--loss_mean 1 \
--align_q 0 \
--fold 3
<fold>
: fold-id of pascal-5i and coco-5i<simidx>
: number of prompt pairs
-
Replace train_vp_segmentation.py with train_vp_detection.py to train for single object detection.
-
Replace train_vp_segmentation.py with train_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to train for coloring.
-
Change the value of simidx to determine the number of prompt pairs used during training.
The logs, model checkpoints will be generated under the Data/output/logs/
and Data/save_ours_ckpt/
folders, respectively.
We provide the evaluation code for model checkpoints (if exist). The test command is as follows:
python3 Codes/val_vp_segmentation.py \
--fold 1\
--mode spimg_spmask\
--output_dir Data/output/logs/\
--device cuda:0\
--base_dir Data/pascal-5i/\
--batch-size 8\
--lr 0.03\
--epoch 150\
--arr a1\
--vp-model Prompt\
--p-eps 1\
--ckpt Data/weights/checkpoint-1000.pth\
--vq_ckpt_dir Data/weights/vqgan\
--save_base_dir Data/\
--simidx 1\
--dropout 0.25\
--align_q 0 \
--save_model_path SAVE_MODEL_PATH
-
Replace val_vp_segmentation.py with val_vp_detection.py to inference for single object detection.
-
Replace val_vp_segmentation.py with val_vp_coloring.py, then replace --base_dir Data/pascal-5i/ with --base_dir Data/imagenet/ to inference for coloring.
-
Change the value of simidx to determine the number of prompt pairs used during inference.
To facilitate the readers' implementation of inference, we have also designed a simple bash script for inference. To run it, navigate to the root directory of CVPR25_Condenser and execute the following command:
bash Codes/script/run01.sh
This will complete the relevant inference tasks.
Download the checkpoint to the Data/ckpt
path. Run the corresponding .sh
file to achieve one-click execution and directly obtain the results shown in the table.
Task (Metric) | Dataset | K=1 | K=16 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Performance | Log | Checkpoint | Script | Performance | Log | Checkpoint | Script | |||
Segmentation (mIoU↑) | Pascal-5i | Folder 0 | 42.13 | Seg_K_1_Fold_0_Log | Seg_K_1_Fold_0.pth | run01.sh | 45.53 | Seg_K_16_Fold_0_Log | Seg_K_16_Fold_0.pth | run02.sh |
Folder 1 | 50.31 | Seg_K_1_Fold_1_Log | Seg_K_1_Fold_1.pth | run03.sh | 52.06 | Seg_K_16_Fold_1_Log | Seg_K_16_Fold_1.pth | run04.sh | ||
Folder 2 | 42.20 | Seg_K_1_Fold_2_Log | Seg_K_1_Fold_2.pth | run05.sh | 44.33 | Seg_K_16_Fold_2_Log | Seg_K_16_Fold_2.pth | run06.sh | ||
Folder 3 | 41.90 | Seg_K_1_Fold_3_Log | Seg_K_1_Fold_3.pth | run07.sh | 44.58 | Seg_K_16_Fold_3_Log | Seg_K_16_Fold_3.pth | run08.sh | ||
Detection (mIoU↑) | Pascal VOC 2012 | 43.22 | Det_K_1_Log | Det_K_1.pth | run09.sh | 44.64 | Det_K_16_Log | Det_K_16.pth | run10.sh | |
Coloring (MSE↓) | ImageNet-1K | 0.56 | Col_K_1_Log | Col_K_1.pth | run11.sh | 0.54 | Col_K_16_Log | Col_K_16.pth | run12.sh |
We have also open-sourced the experiment logs and checkpoints for the domain adaptation experiments. The experiments were pre-trained on Coco-5i and tested on Pascal-5i.
Task (Metric) | Dataset | K=1 | K=16 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Performance | Log | Checkpoint | Script | Performance | Log | Checkpoint | Script | |||
Segmentation (mIoU↑) | Coco-5i | Folder 0 | 40.39 | Coco_K_1_Fold_0_Log | Coco_K_1_Fold_0.pth | run13.sh | 40.37 | Coco_K_16_Fold_0_Log | Coco_K_16_Fold_0.pth | run14.sh |
Folder 1 | 44.54 | Coco_K_1_Fold_1_Log | Coco_K_1_Fold_1.pth | run15.sh | 44.85 | Coco_K_16_Fold_1_Log | Coco_K_16_Fold_1.pth | run16.sh | ||
Folder 2 | 40.23 | Coco_K_1_Fold_2_Log | Coco_K_1_Fold_2.pth | run17.sh | 41.03 | Coco_K_16_Fold_2_Log | Coco_K_16_Fold_2.pth | run18.sh | ||
Folder 3 | 36.33 | Coco_K_1_Fold_3_Log | Coco_K_1_Fold_3.pth | run19.sh | 35.84 | Coco_K_16_Fold_3_Log | Coco_K_16_Fold_3.pth | run20.sh |
We have provided visualized results for some test cases to help readers with an intuitive understanding.
If you find our code useful or use the toolkit in your work, please consider citing:
@inproceedings{Wang25_Condenser,
author={Wang, Jinpeng and Luo, Tianci and Zha, Yaohua and Feng, Yan and Luo, Ruisheng and Chen, Bin and Dai, Tao and Chen, Long and Wang, Yaowei and Xia, Shu-Tao},
title={Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
This code is based on our previous work InMeMo.
We are also grateful for other teams for open-sourcing codes that inspire our work, including Visual Prompting, visual_prompt_retrieval, timm, ILM-VP.
If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.