Diffusion Bench

This is a bench for learning about prompt engineering when using text-to-image and image-to-image diffusion models. It is currently in DRAFT form... I'm not tracking breaking changes.

Installation

If you don't have asdf (language version manager) and direnv installed, start with that:

# Install python using asdf or your
# chosen version manager
brew install asdf direnv
asdf plugin add python https://github.com/danhper/asdf-python.git
asdf install python 3.10.13

# configure the version of python to use
asdf local python 3.10.13
# or asdf global python 3.10.13
# Note: when using local a ".tool-versions" file is created

Then you can create an install the virtual environment with direnv:

# configure your ENVVARS
# (open the .envrc file and make necessary changes)
cp .envrc-example .envrc

# create the venv, install packages, and enable ENVVARS
direnv allow .
# If you don't want to use direnv, read the .envrc-example
# file and execute most of those commands using your
# preferred method

Installation Notes

Make sure to keep the requirements.txt up to date when adding dependencies:

python3 -m pip install scipy
python3 -m pip freeze > requirements.txt

The original install to produce that requirements.txt was:

python3 -m pip install --upgrade transformers sentencepiece \
               diffusers ipykernel invisible_watermark \
               accelerate safetensors torch ipyplot scipy

If you want to use TencentARC/t2i-adapter-lineart-sdxl-1.0, it needs a different version of diffusers and controlnet_aux for conditioning models and detectors:

python3 -m pip uninstall diffusers
python3 -m pip install --upgrade \
  git+https://github.com/huggingface/diffusers.git
python3 -m pip install --upgrade controlnet_aux==0.0.7

Usage

python3 -m src [-h] [-p PROMPT] [-n NEGATIVE_PROMPT]
  [-x WIDTH] [-y HEIGHT] [-c COUNT] [-s SEEDS] [-m MODELS]
  [-r STEPS] [-o OUTPUT_PATH] [-t OUTPUT_PATH_TEMPLATE]
  [-i INPUT_PATHS] [-a REFINEMENT_MODE] [-d DEVICE_TYPE]

options:

-h, --help: show this help message and exit
-p, --prompt: A pipe-delimited list of descriptions of what you would like to render
-n, --negative_prompt: A pipe-delimited list of descriptions of what you would NOT like to render (default=None)
-x, --width: The width of the output image (default=896)
-y, --height: The height of the output image (default=640)
-c, --count: The number of images to produce with each given model (default=1)
-s, --seeds: A comma-separated list of PRNGs to use when generating an image to produce more predictable results and explore an idea
-m, --models: A comma-separated list of HuggingFace Models to to use. (default= dreamlike-art/dreamlike-photoreal-2.0 is used to generate an image followed by 3, sequential steps of refinement using stabilityai/stable-diffusion-xl-refiner-1.0)
-r, --steps: A comma-separated list of the number inference steps to use with each model (length must match the length of -m)
-o, --output_path: A path to a folder where images will be saved (can be relative)
-t, --output_path_template: A template for naming the files (default=":path/:count_idx-:type-:model_idx.png")
-i, --input_paths: A comma-separated list of paths to images that will be refined or upscaled by the given models
-d, --device_type The type of device the pipes will be fed to for processing (default="cuda" if cuda is supported, else "mps" if apple M1/M2, else "cpu")
--refinement_mode: one of:
- "sequence" (each pass is fed into the next pass),
- "first_to_many" (each pass is fed the first item generated),
- "in_to_many" (each pass is fed the value of -i/--input_paths)
--copyright: Who should be granted ownership of this image

models:

wavymulder/Analog-Diffusion
- NOTE: you have to use "analog style" in the prompt for this to take effect
Deci/DeciDiffusion-v1-0
dreamlike-art/dreamlike-photoreal-2.0
prompthero/openjourney
stabilityai/stable-diffusion-2-1
stabilityai/stable-diffusion-x4-upscaler
stabilityai/stable-diffusion-xl-base-1.0
stabilityai/stable-diffusion-xl-refiner-1.0
TencentARC/t2i-adapter-lineart-sdxl-1.0
timbrooks/instruct-pix2pix
minimaxir/sdxl-wrong-lora
- NOTE: you have to use "wrong" as a negative prompt for this to take effect

Examples

Generate images (text-to-image)

Given the same prompt, model, and number of inference steps, generate 3 images:

python3 -m src --prompt "a cute, tabby cat" \
  --models "dreamlike-art/dreamlike-photoreal-2.0" \
  --steps "10" \
  --count 3

# =>
# loading pipeline: dreamlike-art/dreamlike-photoreal-2.0
# Loading pipeline components...: 100%|███████| 5/5
# [00:01<00:00,  4.06it/s]
# prompt:          ['a cute, tabby cat']
# negative_prompt: []
# width:           896
# height:          640
# count:           3
# seeds:           []
# model_ids:       ['dreamlike-art/dreamlike-photoreal-2.0']
# steps:           [10]
# input_paths:     []
# paths:           [['images/out-01-GENERATOR-01.png',
#                    'images/out-02-GENERATOR-01.png',
#                    'images/out-03-GENERATOR-01.png']]
# device:          mps
#
# model_id: dreamlike-art/dreamlike-photoreal-2.0
# prompt: a cute, tabby cat
# kwargs: {'negative_prompt': None, 'num_inference_steps': 10,
#          'width': 896, 'height': 640}
# seed:   4042405154439330027
#
# 100%|████████████████████████████████████| 10/10
# [00:35<00:00,  3.52s/it]
# ...

Generating images using different models (text-to-image)

Given the same prompt, model, and number of inference steps, generate 3 images each, for 2 different models: "dreamlike-art/dreamlike-photoreal-2.0" and "stabilityai/stable-diffusion-xl-base-1.0":

python3 -m src --prompt "a cute, tabby cat" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, stabilityai/stable-diffusion-xl-base-1.0" \
  --steps "10, 10" \
  --count 3

Evaluate the effects of changing steps (text-to-image)

Given the same prompt, model and seed, generate images using a different number of inference steps for each image:

python3 -m src --prompt "a cute, tabby cat" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0" \
  --steps "10, 15, 20, 25" \
  --seed "1175243130925179488, 1175243130925179488, 1175243130925179488, 1175243130925179488" \
  --count 1

Evaluate negative prompts (text-to-image)

Given a base image with a blurry, mangled face:

python3 -m src --prompt "a cute, tabby cat" \
  --models "dreamlike-art/dreamlike-photoreal-2.0" \
  --seeds "12621049111909025765" \
  --steps "10"

Generate 4 images using a different negative prompt for each image ("blurry", "mangled face", "deformed face", "blurry, mangled face, deformed face"):

python3 -m src --prompt "a cute, tabby cat" \
  --negative_prompt "blurry | mangled face | deformed face | blurry, mangled face, deformed face" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0" \
  --seeds "12621049111909025765, 12621049111909025765, 12621049111909025765, 12621049111909025765" \
  --steps "10, 10, 10, 10"

Evaluate prompts, given the same negative prompt (text-to-image)

Given the same negative prompt, model, seed, and number of inference steps, generate 4 images with different prompts for each image (a dog, a cat, a bear, a pig):

python3 -m src --prompt "a dog | a cat | a bear | a pig" \
  --negative_prompt "outside" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0, dreamlike-art/dreamlike-photoreal-2.0" \
  --steps "10, 10, 10, 10" \
  --seed "1175243130925179488, 1175243130925179488, 1175243130925179488, 1175243130925179488" \
  --count 1

NOTE the bear has the most outside context. Is this because the model is trained with pictures of domesticated animals both inside and outside, and with wild animals that are primarily outside?

Refine images with ensembles (text-to-image + image-to-image)

Given a prompt:

generate an image
refine the image generated in step 1
refine the image refined in step 2

python3 -m src --prompt "a cute, tabby cat" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, stabilityai/stable-diffusion-xl-refiner-1.0, stabilityai/stable-diffusion-xl-refiner-1.0" \
  --steps "10, 50, 50" \
  --refinement_mode "sequence"

Refine images with ensembles (image-to-image)

Given an input image:

load the image from disk
refine the image loaded in step 1
refine the image refined in step 2

python3 -m src --prompt "a cute, tabby cat" \
  --input_paths "images/01-generator-01.png" \
  --models "stabilityai/stable-diffusion-xl-refiner-1.0, stabilityai/stable-diffusion-xl-refiner-1.0" \
  --steps "50, 50" \
  --refinement_mode "sequence"

Yes, and... with ensembles (text-to-image + image-to-image)

Given a prompt:

generate an image
refine the image generated in step 1
refine the image refined in step 2

python3 -m src --prompt "a cute, tabby cat | replace the cat with a lion" \
  --models "dreamlike-art/dreamlike-photoreal-2.0, timbrooks/instruct-pix2pix" \
  --steps "10, 10" \
  --refinement_mode "sequence"
  --seed "16340881476748913736"

Licensing

Make sure to read the licenses for any model you use with this. The licenses vary from model to model. For instance dreamlike-art/dreamlike-photoreal-2.0 uses an adaptation of Creative Commons and limits corporate usage.

This bench should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
src		src
.envrc-example		.envrc-example
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffusion Bench

Installation

Installation Notes

Usage

Examples

Generate images (text-to-image)

Generating images using different models (text-to-image)

Evaluate the effects of changing steps (text-to-image)

Evaluate negative prompts (text-to-image)

Evaluate prompts, given the same negative prompt (text-to-image)

Refine images with ensembles (text-to-image + image-to-image)

Refine images with ensembles (image-to-image)

Yes, and... with ensembles (text-to-image + image-to-image)

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

losandes/diffusion-bench

Folders and files

Latest commit

History

Repository files navigation

Diffusion Bench

Installation

Installation Notes

Usage

Examples

Generate images (text-to-image)

Generating images using different models (text-to-image)

Evaluate the effects of changing steps (text-to-image)

Evaluate negative prompts (text-to-image)

Evaluate prompts, given the same negative prompt (text-to-image)

Refine images with ensembles (text-to-image + image-to-image)

Refine images with ensembles (image-to-image)

Yes, and... with ensembles (text-to-image + image-to-image)

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages