Testing layout-guidance for my master thesis

Training-Free Layout Control with Cross-Attention Guidance

Our method manage to control of layout of images generated by large pretrained Text-to-Image diffusion models without training through the layout guidance performed on the cross-attention maps.

Abstract

Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.

Quick start

conda create --name layout_guidance python=3.10
conda activate layout_guidance
pip install -r requirements.txt

Image generation

The .csv file containing the prompts should be inside a folder named prompts that is posiotioned in the root of the project.

The .csv file used is expected to have the following structure (no limits in the number of objects): id,prompt,obj1,bbox1,obj2,bbox2,obj3,bbox3,obj4,bbox4

Run the script inference.py

Citation

@article{chen2023trainingfree,
      title={Training-Free Layout Control with Cross-Attention Guidance},
      author={Minghao Chen and Iro Laina and Andrea Vedaldi},
      journal={arXiv preprint arXiv:2304.03373},
      year={2023}
}

Acknowledgements

This research is supported by ERC-CoG UNION 101001212. The codes are inspired by Diffuser and Stable Diffusion.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
conf		conf
my_model		my_model
prompts		prompts
.gitignore		.gitignore
README.md		README.md
dreambooth.py		dreambooth.py
font.ttf		font.ttf
inference.py		inference.py
logger.py		logger.py
requirements.txt		requirements.txt
text_inversion.py		text_inversion.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Testing layout-guidance for my master thesis

Training-Free Layout Control with Cross-Attention Guidance

Abstract

Quick start

Image generation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Nikura3/layout-guidance

Folders and files

Latest commit

History

Repository files navigation

Testing layout-guidance for my master thesis

Training-Free Layout Control with Cross-Attention Guidance

Abstract

Quick start

Image generation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages