A simple benchmark for turning on/off different types of features in SDXL Turbo while generating images.
- Create a new environment (recommended)
conda create -n riebench python=3.12
conda activate riebench
- Install sdxl-unbox
cd sdxl-unbox
pip install -r requirements.txt
- Install grounded SAM2 (
/path/to/cuda-12.1/
normally is/usr/local/cuda-12.1
)
cd Grounded-SAM-2
export CUDA_HOME=/path/to/cuda-12.1/
pip install -e .
pip install --no-build-isolation -e grounding_dino
cd checkpoints
bash download_ckpts.sh
cd ../gdino_checkpoints
bash download_ckpts.sh
- Install remaining requirements (in
RIEBench/
)
pip install -r requirements.txt
- Download the
$k=160$ ,$n_f=5120$ SAEs
GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:wendlerc/sdxl-turbo-saes
cd sdxl-turbo-saes
git lfs pull --include="\
unet.down_blocks.2.attentions.1_k160_hidden5120_auxk256_bs4096_lr0.0001/*,\
unet.mid_block.attentions.0_k160_hidden5120_auxk256_bs4096_lr0.0001/*,\
unet.up_blocks.0.attentions.0_k160_hidden5120_auxk256_bs4096_lr0.0001/*,\
unet.up_blocks.0.attentions.1_k160_hidden5120_auxk256_bs4096_lr0.0001/*"
Transporting 80 SAE features with strength 2
papermill main.ipynb out/main.ipynb -p k_trans 80 -p m1 2
Transporting 10000 neurons with strength 2
papermill main.ipynb out/main.ipynb -p k_trans 10000 -p m1 2 -p mode neurons
Steering
papermill main.ipynb out/main.ipynb -p m1 1 -p mode steering
This will result in a bunch of subfolders containing the resulting images in results
.
In order to compute the metrics, we need to create the reference images.
papermill create_reference.ipynb out/create_reference.ipynb
Now, you can compute the LPIPS and CLIP scores for a method/result folder. E.g., assuming you ran with above hyperparameters.
papermill score.ipynb out/score.ipynb -p path ./results/modesae_spatialTrue_subtractTrue_downTrue_upTrue_up0True_midTrue_T4_ktrans80_str2.0 -p name sae_80_2
This will create a sae_80_2.csv
in the corresponding result folder.
@misc{surkov2025onestepenoughsparseautoencoders,
title={One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models},
author={Viacheslav Surkov and Chris Wendler and Antonio Mari and Mikhail Terekhov and Justin Deschenaux and Robert West and Caglar Gulcehre and David Bau},
year={2025},
eprint={2410.22366},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.22366},
}