Skip to content

fix broken imgs in unit 3 #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions unit3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Here are the steps for this unit:

## Introduction

![SD example images](sd_demo_images.jpg)<br>
![SD example images](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/diffusion-course/sd_demo_images.jpg)<br>
_Example images generated using Stable Diffusion_

Stable Diffusion is a powerful text-conditioned latent diffusion model. Don't worry, we'll explain those words shortly! Its ability to create amazing images from text descriptions has made it an internet sensation. In this unit, we're going to explore how SD works and see what other tricks it can do.
Expand Down Expand Up @@ -50,7 +50,7 @@ OK, so how do we actually feed this conditioning information into the UNet for i

It turns out that even with all of the effort put into making the text conditioning as useful as possible, the model still tends to default to relying mostly on the noisy input image rather than the prompt when making its predictions. In a way, this makes sense - many captions are only loosely related to their associated images and so the model learns not to rely too heavily on the descriptions! However, this is undesirable when it comes time to generate new images - if the model doesn't follow the prompt then we may get images out that don't relate to our description at all.

![CFG scale demo grid](cfg_example_0_1_2_10.jpeg)<br>
![CFG scale demo grid](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/diffusion-course/cfg_example_0_1_2_10.jpeg)<br>
_Images generated from the prompt "An oil painting of a collie in a top hat" with CFG scale 0, 1, 2 and 10 (left to right)_

To fix this, we use a trick called Classifier-Free Guidance (CGF). During training, text conditioning is sometimes kept blank, forcing the model to learn to denoise images with no text information whatsoever (unconditional generation). Then at inference time, we make two separate predictions: one with the text prompt as conditioning and one without. We can then use the difference between these two predictions to create a final combined prediction that pushes **even further** in the direction indicated by the text-conditioned prediction according to some scaling factor (the guidance scale), hopefully resulting in an image that better matches the prompt. The image above shows the outputs for a prompt at different guidance scales - as you can see, higher values result in images that better match the description.
Expand Down