GuardT2I

Official implementation of the paper: GuardT2I: Defending Text-to-Image Models from Adversarial Prompts (NeurIPS 2024)

GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

Abstract

Recent advancements in Text-to-Image (T2I) models have raised significant safety concerns about their potential misuse for generating inappropriate or Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as NSFW classifiers or model fine-tuning for inappropriate concept removal. Addressing this challenge, our study unveils GuardT2I, a novel moderation framework that adopts a generative approach to enhance T2I models' robustness against adversarial prompts. Instead of making a binary classification, GuardT2I utilizes a Large Language Model (LLM) to conditionally transform text guidance embeddings within the T2I models into natural language for effective adversarial prompt detection, without compromising the models' inherent performance. Our extensive experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator by a significant margin across diverse adversarial scenarios.

Method Overview

Use GuardT2I

Download models (*.pt) from https://huggingface.co/YijunYang280/GuardT2I, and replace the model paths in evaluation.py line 134-137.

python evaluation.py

Training code

To train GuardT2I, you may first download the preprocessed training dataset from [https://huggingface.co/datasets/YijunYang280/GuardT2I_dataset_laion_coco_caption_only_with_clip_features].

Since the original training dataset files are large and complex to process to fit our requirements, we preprocess them into an off-the-shelf version, namely GuradT2I_dataset_train_part1_2M.pt, GuradT2I_dataset_train_part2_4M.pt, GuradT2I_dataset_train_part3_4M.pt, which are all self-contained. You can download any or all of them to ./datasets/ and start training.

Citation

If you like or use our work, please cite us:

@inproceedings{yang2024guardt2i,
  title     = {GuardT2I: Defending Text-to-Image Models from Adversarial Prompts},
  author    = {Yijun Yang and  Ruiyuan Gao and Xiao Yang and Jianyuan Zong and Qiang Xu},
  booktitle = {Advances in Neural Information Processing Systems ({NeurIPS})},
  volume    = {37},
  year      = {2024},
  url       = {https://arxiv.org/abs/2403.01446},
}

and we also recommend that you read and cite our attack framework:

@inproceedings{yang2024mmadiffusion,
      title={{MMA-Diffusion: MultiModal Attack on Diffusion Models}}, 
      author={Yijun Yang and Ruiyuan Gao and Xiaosen Wang and Tsung-Yi Ho and Nan Xu and Qiang Xu},
      year={2024},
      booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
datasets		datasets
images		images
logs		logs
training_code		training_code
README.md		README.md
clean_1000.pt		clean_1000.pt
evaluation.py		evaluation.py
mma_adv_prompt_1000.pt		mma_adv_prompt_1000.pt
sd.yml		sd.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GuardT2I

Abstract

Method Overview

Use GuardT2I

Training code

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cure-lab/GuardT2I

Folders and files

Latest commit

History

Repository files navigation

GuardT2I

Abstract

Method Overview

Use GuardT2I

Training code

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages