Skip to content

cvl-umass/GTA-CLIP

Repository files navigation

Generate, Transduct, Adapt: Iterative Transduction with VLMs

This is the code-base for GTA-CLIP proposed in

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

ICCV'2025

method

Overview of GTA-CLIP

(a) Vision-language models (VLMs) such as CLIP enable zero-shot classification using similarity between text embeddings of class prompts and images.
(b) Transduction exploits the structure of entire image dataset to assign images to classes improving accuracy.
(c) Our approach, GTA-CLIP, iteratively
  (i) induces structure over classes in language space by generating attributes based on pairwise confusions,
  (ii) performing attribute-augmented transductive inference, and
  (iii) adapting CLIP encoders using the inferred labels.
(d) Across 12 datasets we improve upon CLIP and transductive CLIP by 8.6% and 4.0% using VIT-B/32, and similarly for other encoders. Significant improvements are also reported in the few-shot setting.

Preparation

Create a conda environment with the specifications

conda create -y --name GTACLIP python=3.10.0
conda activate GTACLIP
pip3 install -r requirements.txt
export TOKENIZERS_PARALLELISM=true

Datasets

Please follow DATASETS.md to install the datasets. For CUB dataset, follow AdaptCLIPZS

Static LLM Attributes

Download "gpt_descriptions" from AdaptCLIPZS

Running GTA-CLIP

python run_gtaclip.py --dataset <dataset_name> --root_path </path/to/datasets/folder> --backbone <clip_backbone> --gpt_path </path/to/adaptclizs/visual/attributes --gpt_path_location </path/to/adaptclizs/location/attributes

On completion this code will print the accuracies of base CLIP, TransCLIP, and GTA-CLIP for the specified dataset. The --root_path should be assigned to the folder containing all the datasets. --backbone is the CLIP architecture eg. 'vit_b16'. The --gpt_path is the path to the folder containing GPT generated attributes for the specific dataset which can be obtained from AdaptCLIPZS. Note that only CUB and Flowers datasets have the --gpt_path_location attributes. The results should be close to this table:

results

Todo: Code for few-shot results

Thanks to TransCLIP for releasing the code base which our code is built upon.


Citation

If you find our work useful, please consider citing:

@article{saha2025generate,
  title={Generate, Transduct, Adapt: Iterative Transduction with VLMs},
  author={Saha, Oindrila and Lawrence, Logan and Van Horn, Grant and Maji, Subhransu},
  journal={arXiv preprint arXiv:2501.06031},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages