Skip to content

Commit a0b4b9d

Browse files
committed
fix broken imgs
1 parent 4f0c66c commit a0b4b9d

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

unit3/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Here are the steps for this unit:
1717

1818
## Introduction
1919

20-
![SD example images](sd_demo_images.jpg)<br>
20+
![SD example images](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/diffusion-course/sd_demo_images.jpg)<br>
2121
_Example images generated using Stable Diffusion_
2222

2323
Stable Diffusion is a powerful text-conditioned latent diffusion model. Don't worry, we'll explain those words shortly! Its ability to create amazing images from text descriptions has made it an internet sensation. In this unit, we're going to explore how SD works and see what other tricks it can do.
@@ -50,7 +50,7 @@ OK, so how do we actually feed this conditioning information into the UNet for i
5050

5151
It turns out that even with all of the effort put into making the text conditioning as useful as possible, the model still tends to default to relying mostly on the noisy input image rather than the prompt when making its predictions. In a way, this makes sense - many captions are only loosely related to their associated images and so the model learns not to rely too heavily on the descriptions! However, this is undesirable when it comes time to generate new images - if the model doesn't follow the prompt then we may get images out that don't relate to our description at all.
5252

53-
![CFG scale demo grid](cfg_example_0_1_2_10.jpeg)<br>
53+
![CFG scale demo grid](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/diffusion-course/cfg_example_0_1_2_10.jpeg)<br>
5454
_Images generated from the prompt "An oil painting of a collie in a top hat" with CFG scale 0, 1, 2 and 10 (left to right)_
5555

5656
To fix this, we use a trick called Classifier-Free Guidance (CGF). During training, text conditioning is sometimes kept blank, forcing the model to learn to denoise images with no text information whatsoever (unconditional generation). Then at inference time, we make two separate predictions: one with the text prompt as conditioning and one without. We can then use the difference between these two predictions to create a final combined prediction that pushes **even further** in the direction indicated by the text-conditioned prediction according to some scaling factor (the guidance scale), hopefully resulting in an image that better matches the prompt. The image above shows the outputs for a prompt at different guidance scales - as you can see, higher values result in images that better match the description.

0 commit comments

Comments
 (0)