Generate, Transduct, Adapt: Iterative Transduction with VLMs

This is the code-base for GTA-CLIP proposed in

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

ICCV'2025

[arXiv] | [Poster] | [Video]

Overview of GTA-CLIP

(a) Vision-language models (VLMs) such as CLIP enable zero-shot classification using similarity between text embeddings of class prompts and images.
(b) Transduction exploits the structure of entire image dataset to assign images to classes improving accuracy.
(c) Our approach, GTA-CLIP, iteratively
(i) induces structure over classes in language space by generating attributes based on pairwise confusions,
(ii) performing attribute-augmented transductive inference, and
(iii) adapting CLIP encoders using the inferred labels.
(d) Across 12 datasets we improve upon CLIP and transductive CLIP by 8.6% and 4.0% using VIT-B/32, and similarly for other encoders. Significant improvements are also reported in the few-shot setting.

Preparation

Create a conda environment with the specifications

conda create -y --name GTACLIP python=3.10.0
conda activate GTACLIP
pip3 install -r requirements.txt
export TOKENIZERS_PARALLELISM=true

Datasets

Please follow DATASETS.md to install the datasets. For CUB dataset, follow AdaptCLIPZS

Static LLM Attributes

Download "gpt_descriptions" from AdaptCLIPZS

Running GTA-CLIP

python run_gtaclip.py --dataset <dataset_name> --root_path </path/to/datasets/folder> --backbone <clip_backbone> --gpt_path </path/to/adaptclizs/visual/attributes --gpt_path_location </path/to/adaptclizs/location/attributes

On completion this code will print the accuracies of base CLIP, TransCLIP, and GTA-CLIP for the specified dataset. The --root_path should be assigned to the folder containing all the datasets. --backbone is the CLIP architecture eg. 'vit_b16'. The --gpt_path is the path to the folder containing GPT generated attributes for the specific dataset which can be obtained from AdaptCLIPZS. Note that only CUB and Flowers datasets have the --gpt_path_location attributes. The results should be close to this table:

Todo: Code for few-shot results

Thanks to TransCLIP for releasing the code base which our code is built upon.

Citation

If you find our work useful, please consider citing:

@article{saha2025generate,
  title={Generate, Transduct, Adapt: Iterative Transduction with VLMs},
  author={Saha, Oindrila and Lawrence, Logan and Van Horn, Grant and Maji, Subhransu},
  journal={arXiv preprint arXiv:2501.06031},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
TransCLIP_solver		TransCLIP_solver
attributes_llama3		attributes_llama3
clip		clip
datasets		datasets
extras		extras
.gitattributes		.gitattributes
.gitignore		.gitignore
DATASETS.md		DATASETS.md
README.md		README.md
requirements.txt		requirements.txt
run_gtaclip.py		run_gtaclip.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generate, Transduct, Adapt: Iterative Transduction with VLMs