Zheng Li1 · Yibing Song2 · Ming-Ming Cheng1 · Xiang Li1 · Jian Yang1
1Nankai University, 2Damo Academy, Alibaba Group
ICCV 2025
[Paper] [Project Page] [中文解读] [中文翻译]
- If you are interested in prompt learning and want to know more about related work, we also maintain a list of awesome papers for your reference.
- If you are trying to reproduce the results of this implementation on the Stanfordcars dataset, the link to this dataset may be broken and unavailable. We have provided the dataset in GitHub releases for your convenience.
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang.
CVPR 2024
[Paper] [Code] [Project Page] [Poster] [中文论文解读] [视频解读] [中文翻译]
In this work, we introduce an attribute-anchored textual prompt learning method for vision-language models, named ATPrompt.
This method extends the learning space of soft prompts from the original one-dimensional category level to the multi-dimensional attribute level by incorporating multiple universal attribute tokens into the learnable soft prompts.
Guided by these attributes, soft tokens acquire not only category-specific but also attribute-related general representations during training, thereby enhancing the alignment between images and unknown categories compared to the original method.
Figure 1. Architectural comparison among vanilla CLIP, classic prompt learning, and our proposed attribute-anchored prompt learning.Figure 2. An illustration of the computation process for shallow and deep versions.
-
Create the environment and install Dassl.pytorch library. Please follow the instructions detailed in [INSTALL.md].
-
Prepare the dataset. Please follow the instructions detailed in [DATASETS.md]. If you are unable to access the StanfordCars dataset, we have provided the dataset in [GitHub releases] for your convenience.
-
(Optional) Download the original ViT-B/16 and ViT-L/14 CLIP model weights from the official OpenAI website. Then place these models in the
./clip
folder. Comment thetrainers/coop.py line 42
and uncomment theline 43
.
[ViT-B/16 CLIP] [ViT-L/14 CLIP]
For more practical information about this process, please refer to [Attribute_Search.md].
(1) Directly use our results.
Here we provide the five attribute bases obtained by querying the LLM (GPT-4o) and the final result after the differentiable attribute search. You can directly use our results for subsequent training.
Expand the list below👇 to see the results:
Click to expand "Attribute Lists"
Dataset | Attribute Bases | Searched Results |
---|---|---|
ImageNet-1K | color, size, shape, habitat, behavior | (color, shape) |
Caltech101 | shape, color, material, function, size | (shape,size) |
Oxford Pets | loyalty, affection, playfulness, energy, intelligence | (playfulness, energy) |
Stanford Cars | design, engine, performance, luxury, color | (luxury) |
Flowers-102 | color, flower, habitat, growth, season | (color, habitat, growth) |
Food-101 | flavor, texture, origin, ingredients, preparation | (flavor, preparation) |
FGVC Aircraft | design, capacity, range, engines, liveries | (design, range) |
SUN-397 | architecture, environment, structure, design, function | (function) |
DTD | pattern, texture, color, design, structure | (pattern, color, design) |
EuroSAT | habitat, foliage, infrastructure, terrain, watercourse | (habitat) |
UCF-101 | precision, coordination, technique, strength, control | (precision) |
Table 1. Attribute bases and searched results for each dataset.
(2) Reproduce the whole process on your own.
- Register a ChatGPT service account (We are using ZhiZengZeng) and enter the API Key in
gpt_query.py line 27
. Then run the following code:
python gpt_query.py
In this way, you will get five output attributes after running the code.
(You can change the input prompt in gpt_query.py line 94
to specify as many attributes as you want.)
- Enter the five attributes into the variables
ATT1_TEXT
,ATT2_TEXT
,ATT3_TEXT
,ATT4_TEXT
andATT5_TEXT
inscripts/attribute_compute/main.sh
. Then run the attribute search code:
sh scripts/attribute_compute/main.sh
Select the result with the highest confidence in the last epoch as our target attribute.
In the following <Training Logs & Weights>, we provide the complete attribute searching log on ten datasets for your reference.
Here we take the CoOp+ATPrompt method as an example. You can switch to other baseline methods if you want.
In [ATPrompt.md], we provide full implementation details for researchers to reproduce our results.
(1) Base-to-Novel Experiments.
-
The config files for each baseline method are provided in
configs/trainers/
. You can modify the hyper-parameters in these config files. -
Change the
DATA
inscripts/coop/base2new_train.sh line 4
to your current dataset path. -
Run the following commands to train the model using the ATPrompt method:
🚀 Training:
# CoOp+ATPrompt, dataset=imagenet
sh scripts/coop/atp_base2new_train.sh imagenet
# CoOp+ATPrompt, dataset=caltech101
sh scripts/coop/atp_base2new_train.sh caltech101
⚡ Testing:
# CoOp+ATPrompt, dataset=caltech101
sh scripts/coop/atp_base2new_test.sh caltech101
If you don't want to use ATPrompt, you can set TRAINER.ATPROMPT.USE_ATPROMPT
in scripts/coop/base2new_train.sh line 31
to False.
Or you can run the following command:
# Vanilla CoOp
sh scripts/coop/vanilla_base2new_train.sh imagenet
(2) Cross-dataset & Domain Generalization Experiments.
-
Change the
DATA
inscripts/coop/xd_train.sh line 4
to your current dataset path. -
Train the model on the source dataset (ImageNet) and select the best performing model.
sh scripts/coop/xd_train.sh
- After training, evaluate the model on other recognition datasets. For example, the model trained with seed 1 has the best performance. So we evaluate its performance like this:
# Cross-dataset
# dataset=caltech101, seed=1
sh scripts/coop/xd_eval.sh caltech101 1
# Domain Generalization
# dataset=imagenet_a, seed=1
sh scripts/coop/xd_eval.sh imagenet_a 1
In the following part, we provide the complete training log and model weights of CoOp+ATPrompt for your reference.
The results are averaged over 3 seeds. Note that due to the limited number of training samples and network parameters, the performance results may fluctuate. If you cannot achieve the reported results, please run more experiments with different seeds.
Click to expand "Result Figures".

-
Attribute Search.
We provide the complete attribute searching log on ten datasets for your reference.
[Github Releases] -
Base-to-Novel Generalization (CoOp+ATPrompt).
We provide the complete training logs and model weights on 11 datasets for your reference.
[Github Release] -
Cross-dataset Prompt Learning (CoOp+ATPrompt).
We provide model weights and training logs trained on the source dataset (ImageNet) under cross-dataset setings.
[Github Releases]
If you have any questions, you can submit an issue on GitHub, or contact me by email (zhengli97 [at] qq.com).
If you find our paper or repo helpful for your research, please consider citing the following paper and giving this repo a star. Thank you!
@article{li2024advancing,
title={Advancing Textual Prompt Learning with Anchored Attributes},
author={Li, Zheng and Song, Yibing and Cheng, Ming-Ming and Li, Xiang and Yang, Jian},
journal={arXiv preprint arXiv:2412.09442},
year={2024}
}
@inproceedings{li2024promptkd,
title={Promptkd: Unsupervised prompt distillation for vision-language models},
author={Li, Zheng and Li, Xiang and Fu, Xinyi and Zhang, Xin and Wang, Weiqiang and Chen, Shuo and Yang, Jian},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={26617--26626},
year={2024}
}