DISSect: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

by Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, Yanfeng Wang at Cooperative Medianet Innovation Center at Shanghai Jiao Tong University, School of AI at Shanghai Jiao Tong University and Shanghai AI Laboratory.

This paper has been accepted by International Conference on Computer Vision (ICCV) 2025. This repo is the official Pytorch implementation of DISSect.

⚠️ This repository is being organized and updated continuously. Please note that this version is not the final release.

🚀 Quick Start

Installation

Code is implemented based on original code provided by Open CLIP from https://github.com/mlfoundations/open_clip, which offers the standard code for retrieval framework, data loader and evaluation metrics. Besides, create the environment for running our code:

Clone the repository:

git clone MediaBrain-SJTU/DISSect
cd DISSect

Install dependencies:

conda create --name DISSect python==3.9
conda activate DISSect
pip install -r requirements.txt

Configuration

Before running the training, you need to configure the following parameters:

Dataset Paths: Update the data paths in your training script. Note that we are using the public available webdataset form of CC3M and CC12M provided by https://huggingface.co/pixparse for efficient data loading. You should also process your own data into webdataset form for customization.

--train-data 'path/to/your/cc3m-train-{0000..0575}.tar'
--val-data 'path/to/your/cc3m-validation-{0000..0015}.tar'

Dataset Size: Specify the number of samples in your dataset:

--train-num-samples 2905954  # For CC3M
--train-num-samples 12423374 # For CC12M  
--train-num-samples 14681591 # For YFCC15M

Training

Run training on CC3M dataset with your preferred selection strategy:

bash train_cc3m.sh

Or customize the parameters:

torchrun --nproc_per_node 4 src/main.py \
    --train-data 'path/to/your/train-data.tar' \
    --val-data 'path/to/your/val-data.tar' \
    --train-num-samples <your-dataset-size> \
    --select \
    --select-strategy 'warmup_base_sampling' \
    --select-rate 0.2 \
    --epochs 40

📊 Supported Selection Strategies

small_loss: Select samples with smallest contrastive loss
big_loss: Select samples with largest contrastive loss
clipscore: Select samples with highest CLIPScore
random: Random selection
historical_base_sampling: Momentum version of DISSect
warmup_base_sampling: Warm-up version of DISSect

🔧 Key Parameters

--select: Enable data selection
--select-strategy: Choose selection strategy
--select-rate: Selection rate (0.0-1.0)
--warmup-point: Number of warmup epochs

📝 Notes

GPU memory usage depends on batch size and model size. We run DISSect on 8 A100 GPUs during experiments.
The extra forward propagation is an inherent overhead of the online batch selection paradigm and can be accelerated through further low-level optimizations. The reported wall-clock time in Table 6 in the main paper only reflects our algorithm's core efficiency.

🤝 Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation:

@article{zhao2025differential,
  title={Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning},
  author={Zhao, Zihua and Hong, Feng and Chen, Mengxi and Chen, Pengyi and Liu, Benyuan and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on International Conference on Computer Vision},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt
train_cc3m.sh		train_cc3m.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DISSect: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

🚀 Quick Start

Installation

Configuration

Training

📊 Supported Selection Strategies

🔧 Key Parameters

📝 Notes

🤝 Citation

About

Uh oh!

Releases

Packages

Languages

MediaBrain-SJTU/DISSect

Folders and files

Latest commit

History

Repository files navigation

DISSect: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

🚀 Quick Start

Installation

Configuration

Training

📊 Supported Selection Strategies

🔧 Key Parameters

📝 Notes

🤝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages