This repository collects research papers on unsupervised learning methods with VLMs. The repository will be continuously updated to track the latest work in the community.
Keywords: unsupervised learning, test-time adaptation, vision-language models.
- [Aug 7, 2025] Just finished the manuscript.
- Data-Free Transfer
- Unsupervised Domain Transfer
- Episodic Test-Time Adaptation
- Online Test-Time Adaptation
Please visit Adapting Vision-Language Models Without Labels: A Comprehensive Survey for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:
@article{dong2025adapting,
title={Adapting Vision-Language Models Without Labels: A Comprehensive Survey},
author={Dong, Hao and Sheng, Lijun and Liang, Jian and He, Ran and Chatzi, Eleni and Fink, Olga},
journal={arXiv preprint arXiv:2508.05547},
year={2025}}
ICCV-2025
FLOSS: Free Lunch in Open-vocabulary Semantic SegmentationICCV-2025
Generate, Transduct, Adapt: Iterative Transduction with VLMsICCV-2025
BATCLIP: Bimodal Online Test-Time Adaptation for CLIPICCV-2025
Is Less More? Exploring Token Condensation as Training-free Test-time AdaptationICML-2025
From Local Details to Global Context: Advancing Vision-Language Models with Attention-based SelectionICML-2025
GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language ModelsCVPR-2025
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIPCVPR-2025
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language ModelsCVPR-2025
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute PromptingCVPR-2025
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language ModelsCVPR-2025
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt TuningCVPR-2025
Realistic Test-Time Adaptation of Vision-Language ModelsCVPR-2025
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language ModelsCVPR-2025
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EMCVPR-2025
Bayesian Test-Time Adaptation for Vision-Language ModelsCVPR-2025
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time AdaptationCVPR-2025
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time AdaptationCVPR-2025
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free ApproachICLR-2025
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language ModelsICLR-2025
Noisy Test-Time Adaptation in Vision-Language ModelsICLR-2025
Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language ModelICLR-2025
DynaPrompt: Dynamic Test-Time Prompt TuningICLR-2025
Test-time Adaptation for Cross-modal Retrieval with Query ShiftAAAI-2025
Learning to Prompt with Text Only Supervision for Vision-Language ModelsAAAI-2025
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation ModelWACV-2025
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic SegmentationWACV-2025
Enhancing Visual Classification using Comparative DescriptorsWACV-2025
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language ModelsWACV-2025
LATTECLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic TextsWACV-2025
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language ModelsWACV-2025
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language ModelsWACV-2025
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic SegmentationWACV-2025
CLIPArTT: Adaptation of CLIP to New Domains at Test TimeIJCV-2025
Diffusion-Enhanced Test-time Adaptation with Text and Image AugmentationTIP-2025
Task-to-Instance Prompt Learning for Vision-Language Models at Test TimePR-2025
A Closer Look at the Explainability of Contrastive Language-Image Pre-trainingPR-2025
CTPT: Continual Test-time Prompt Tuning for Vision-Language ModelsNeurIPS-2024
Boosting Vision-Language Models with TransductionNeurIPS-2024
OTTER: Effortless Label Distribution Adaptation of Zero-shot ModelsNeurIPS-2024
Frustratingly Easy Test-Time Adaptation of Vision-Language ModelsNeurIPS-2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and TransportationNeurIPS-2024
WATT: Weight Average Test-Time Adaptation of CLIPNeurIPS-2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language ModelsNeurIPS-2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional BootstrappingNeurIPS-2024
Historical Test-time Prompt Tuning for Vision Foundation ModelsACMMM-2024
WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language ModelsACMMM-2024
Towards Robustness Prompt Tuning with Fully Test-Time Adaptation for CLIP’s Zero-Shot GeneralizationECCV-2024
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMsECCV-2024
SCLIP: Rethinking Self-Attention for Dense Vision-Language InferenceECCV-2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language InferenceECCV-2024
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary SegmentationECCV-2024
TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution DetectionECCV-2024
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge DistillationECCV-2024
uCAP: An Unsupervised Prompting Method for Vision-Language ModelsECCV-2024
Robust Calibration of Large Vision-Language AdaptersECCV-2024
Robust Calibration of Large Vision-Language AdaptersECCV-2024
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic SegmentationECCV-2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic SegmentationECCV-2024
Online Zero-Shot Classification with CLIPIJCAI-2024
DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity RecognitionICML-2024
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution DetectionICML-2024
Realistic Unsupervised CLIP Fine-tuning with Universal Entropy OptimizationICML-2024
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled DataICML-2024
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language ModelsICML-2024
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language ModelsCVPR-2024
Grounding Everything: Emerging Localization Properties in Vision-Language TransformersCVPR-2024
The Neglected Tails in Vision-Language ModelsCVPR-2024
PromptKD: Unsupervised Prompt Distillation for Vision-Language ModelsCVPR-2024
Label Propagation for Zero-shot Classification with Vision-Language ModelsCVPR-2024
Transductive Zero-Shot and Few-Shot CLIPCVPR-2024
On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?CVPR-2024
Test-Time Zero-Shot Temporal Action LocalizationCVPR-2024
Leveraging Cross-Modal Neighbor Representation for Improved CLIP ClassificationCVPR-2024
Grounding Everything: Emerging Localization Properties in Vision-Language TransformersCVPR-2024
Efficient Test-Time Adaptation of Vision-Language ModelsCVPR-2024
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language ModelsCVPR-2024
Improved Self-Training for Test-Time AdaptationCVPR-2024
Any-Shift Prompting for Generalization over DistributionsICLR-2024
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language ModelsICLR-2024
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature DispersionICLR-2024
PerceptionCLIP: Visual Classification by Inferring and Conditioning on ContextsICLR-2024
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image ClassificationAAAI-2024
Robust Test-Time Adaptation for Zero-Shot Prompt TuningAAAI-2024
DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time AdaptationWACV-2024
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain AdaptationWACV-2024
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-FreeWACV-2024
DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D ClassificationNeurIPS-2023
Neural Priming for Sample-Efficient AdaptationNeurIPS-2023
ChatGPT-Powered Hierarchical Comparisons for Image ClassificationNeurIPS-2023
Neural Priming for Sample-Efficient AdaptationNeurIPS-2023
LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image CollectionsNeurIPS-2023
Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt TuningNeurIPS-2023
Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIPNeurIPS-2023
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language ModelsNeurIPS-2023
Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative FeedbackNeurIPS-2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot GeneralizationNeurIPS-2023
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language ModelsNeurIPS-2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language ModelsACMMM-2023
VPA: Fully Test-Time Visual Prompt AdaptationICCV-2023
What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image ClassificationICCV-2023
SuS-X: Training-Free Name-Only Transfer of Vision-Language ModelsICCV-2023
Waffling around for Performance: Visual Classification with Random Words and Broad ConceptsICCV-2023
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt TuningICML-2023
CHiLS: Zero-Shot Image Classification with Hierarchical Label SetsICML-2023
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image ModelsICML-2023
POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained ModelsCVPR-2023
Texts as Images in Prompt Tuning for Multi-Label Image RecognitionCVPR-2023
Improving Zero-shot Generalization and Robustness of Multi-modal ModelsCVPR-2023
Texts as Images in Prompt Tuning for Multi-Label Image RecognitionICLR-2023
Visual Classification via Description from Large Language ModelsICLR-2023
Masked Unsupervised Self-training for Label-free Image ClassificationAAAI-2023
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free AttentionNeurIPS-2022
ReCo: Retrieve and Co-segment for Zero-shot TransferNeurIPS-2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language ModelsNeurIPS-2022
ReCo: Retrieve and Co-segment for Zero-shot TransferECCV-2022
Extract Free Dense Labels from CLIPICML-2021
Learning Transferable Visual Models From Natural Language Supervision