Skip to content
/ CSCL Public

[CVPR 2025] Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

License

Notifications You must be signed in to change notification settings

liyih/CSCL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou and Zhen Lei

MAIS&CASIA, UCAS, Sangfor, BJTU and Alibaba

arXiv

Introduction

This repository is an official implementation of CSCL.

News

  • [2025/6/9] Camera Ready version is released.
  • [2025/6/9] Codes and weights are released.
  • [2025/2/27] CSCL is accepted by CVPR 2025🎉🎉.

Environment Setting

conda create -n CSCL python=3.8
conda activate CSCL
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install -r code/MultiModal-DeepFake-main/requirements.txt

Data Preparation

Here are the pre-trained model:

Download meter_clip16_224_roberta_pretrain.ckpt: link

Download ViT-B-16.pt: link

Download roberta-base: link

Download Datasets: link

The Folder structure:

./
├── code
│   └── MultiModal-Deepfake (this github repo)
│       ├── configs
│       │   └──...
│       ├── dataset
│       │   └──...
│       ├── models
│       │   └──...
│       ...
│       ├── roberta-base
│       ├── ViT-B-16.pt
│       └── meter_clip16_224_roberta_pretrain.ckpt
└── datasets
    └── DGM4
        ├── manipulation
        ├── origin
        └── metadata

Our pre-trained CSCL model: link (96.34 AUC, 92.48 mAP, 84.07 IoUm, 76.62 F1) (We use train and val set for training and use test set for evaluation.)

Make a folder ./results/CSCL/ and put the pre-trained model in it.

Train

sh train.sh

Inference

Evaluation

sh test.sh

Visualization

use visualize_res function in utils.py (refer to test.py for details).

Evaluation on text or image subset

refer to line 136 in test.py.

Acknowledgements

We thank these great works and open-source codebases: DGM4, METER,

Citation

If you find our work is useful, please give this repo a star and cite our work as:

@inproceedings{li2025unleashing,
  title={Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation},
  author={Li, Yiheng and Yang, Yang and Tan, Zichang and Liu, Huan and Chen, Weihua and Zhou, Xu and Lei, Zhen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={9242--9252},
  year={2025}
}

About

[CVPR 2025] Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published