GitHub - liyih/CSCL: [CVPR 2025] Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou and Zhen Lei

MAIS&CASIA, UCAS, Sangfor, BJTU and Alibaba

Introduction

This repository is an official implementation of CSCL.

News

[2025/6/9] Camera Ready version is released.
[2025/6/9] Codes and weights are released.
[2025/2/27] CSCL is accepted by CVPR 2025🎉🎉.

Environment Setting

conda create -n CSCL python=3.8
conda activate CSCL
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install -r code/MultiModal-DeepFake-main/requirements.txt

Data Preparation

Here are the pre-trained model:

Download meter_clip16_224_roberta_pretrain.ckpt: link

Download ViT-B-16.pt: link

Download roberta-base: link

Download Datasets: link

The Folder structure:

./
├── code
│   └── MultiModal-Deepfake (this github repo)
│       ├── configs
│       │   └──...
│       ├── dataset
│       │   └──...
│       ├── models
│       │   └──...
│       ...
│       ├── roberta-base
│       ├── ViT-B-16.pt
│       └── meter_clip16_224_roberta_pretrain.ckpt
└── datasets
    └── DGM4
        ├── manipulation
        ├── origin
        └── metadata

Our pre-trained CSCL model: link (96.34 AUC, 92.48 mAP, 84.07 IoUm, 76.62 F1) (We use train and val set for training and use test set for evaluation.)

Make a folder ./results/CSCL/ and put the pre-trained model in it.

Train

sh train.sh

Inference

Evaluation

sh test.sh

Visualization

use visualize_res function in utils.py (refer to test.py for details).

Evaluation on text or image subset

refer to line 136 in test.py.

Acknowledgements

We thank these great works and open-source codebases: DGM4, METER,

Citation

If you find our work is useful, please give this repo a star and cite our work as:

@inproceedings{li2025unleashing,
  title={Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation},
  author={Li, Yiheng and Yang, Yang and Tan, Zichang and Liu, Huan and Chen, Weihua and Zhou, Xu and Lei, Zhen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={9242--9252},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code/MultiModal-DeepFake-main		code/MultiModal-DeepFake-main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou and Zhen Lei

MAIS&CASIA, UCAS, Sangfor, BJTU and Alibaba

Introduction

News

Environment Setting

Data Preparation

Train

Inference

Acknowledgements

Citation

About

Uh oh!

Packages

Languages

License

liyih/CSCL

Folders and files

Latest commit

History

Repository files navigation

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou and Zhen Lei

MAIS&CASIA, UCAS, Sangfor, BJTU and Alibaba

Introduction

News

Environment Setting

Data Preparation

Train

Inference

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages