In this master's thesis, I evaluated the safety of images contained in visual datasets as well as the output of text-to-image models for a set of 10,000 prompts.
- CIFAR-10
- CIFAR-100
- ImageNet
- MS COCO
- LSUN
- CelebA
- CC12M
- LAION-5B (excluded, because cluster cannot handle the download)
- DataComp-1B (5% sample, stored on DFKI cluster)
- Stylebreeder
- stable-diffusion-v1-5/stable-diffusion-v1-5
- stabilityai/stable-diffusion-2-1
- DeepFloyd/IF
- stabilityai/stable-diffusion-xl-base-1.0
- stablediffusionapi/newrealityxl-global-nsfw
- playgroundai/playground-v2.5-1024px-aesthetic
- Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled
- black-forest-labs/FLUX.1-schnell
- THUDM/CogView4-6B
- HiDream-ai/HiDream-I1-Full
Our prompt testbench consists of 1,000 safe and 9,000 unsafe prompts (1,000 per LlavaGuard category). It is based on ViSU-Text, filtered to exclude child abuse, transformed into LlavaGuard categories and augmented with prompts generated by Grok AI. The final prompts used in the thesis are stored in the file src/experiments/prompt_testbench/prompt_testbench_10000_v6.csv
.
Because safety annotations can be used adversely to find unsafe images, we did not publish them. They are stored on Cluster42 of TU Darmstadt and can be shared if requested. To retrieve the paths of results, see the files src/experiments/datasets/datasets.json
and src/experiments/safety_benchmark_models/overview.json
.