Skip to content

Commit 348b07a

Browse files
authored
Merge branch 'main' into modular-diffusers-wan
2 parents fd3a2b6 + 7ae6347 commit 348b07a

12 files changed

+124
-69
lines changed

src/diffusers/pipelines/chroma/pipeline_chroma.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -663,11 +663,11 @@ def __call__(
663663
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
664664
will be used.
665665
guidance_scale (`float`, *optional*, defaults to 3.5):
666-
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
667-
`guidance_scale` is defined as `w` of equation 2. of [Imagen
668-
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
669-
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
670-
usually at the expense of lower image quality.
666+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
667+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
668+
669+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
670+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
671671
num_images_per_prompt (`int`, *optional*, defaults to 1):
672672
The number of images to generate per prompt.
673673
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/chroma/pipeline_chroma_img2img.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -725,11 +725,11 @@ def __call__(
725725
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
726726
will be used.
727727
guidance_scale (`float`, *optional*, defaults to 5.0):
728-
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
729-
`guidance_scale` is defined as `w` of equation 2. of [Imagen
730-
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
731-
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
732-
usually at the expense of lower image quality.
728+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
729+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
730+
731+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
732+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
733733
strength (`float, *optional*, defaults to 0.9):
734734
Conceptually, indicates how much to transform the reference image. Must be between 0 and 1. image will
735735
be used as a starting point, adding more noise to it the larger the strength. The number of denoising

src/diffusers/pipelines/flux/pipeline_flux.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -674,7 +674,8 @@ def __call__(
674674
The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
675675
`text_encoder_2`. If not defined, `negative_prompt` is used in all the text-encoders.
676676
true_cfg_scale (`float`, *optional*, defaults to 1.0):
677-
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
677+
True classifier-free guidance (guidance scale) is enabled when `true_cfg_scale` > 1 and
678+
`negative_prompt` is provided.
678679
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
679680
The height in pixels of the generated image. This is set to 1024 by default for the best results.
680681
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
@@ -687,11 +688,11 @@ def __call__(
687688
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
688689
will be used.
689690
guidance_scale (`float`, *optional*, defaults to 3.5):
690-
Guidance scale as defined in [Classifier-Free Diffusion
691-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
692-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
693-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
694-
the text `prompt`, usually at the expense of lower image quality.
691+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
692+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
693+
694+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
695+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
695696
num_images_per_prompt (`int`, *optional*, defaults to 1):
696697
The number of images to generate per prompt.
697698
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/flux/pipeline_flux_control.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -661,11 +661,11 @@ def __call__(
661661
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
662662
will be used.
663663
guidance_scale (`float`, *optional*, defaults to 3.5):
664-
Guidance scale as defined in [Classifier-Free Diffusion
665-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
666-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
667-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
668-
the text `prompt`, usually at the expense of lower image quality.
664+
Embedded guidance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
665+
a model to generate images more aligned with prompt at the expense of lower image quality.
666+
667+
Guidance-distilled models approximates true classifier-free guidance for `guidance_scale` > 1. Refer to
668+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
669669
num_images_per_prompt (`int`, *optional*, defaults to 1):
670670
The number of images to generate per prompt.
671671
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/flux/pipeline_flux_kontext.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -795,11 +795,11 @@ def __call__(
795795
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
796796
will be used.
797797
guidance_scale (`float`, *optional*, defaults to 3.5):
798-
Guidance scale as defined in [Classifier-Free Diffusion
799-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
800-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
801-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
802-
the text `prompt`, usually at the expense of lower image quality.
798+
Embedded guidance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
799+
a model to generate images more aligned with prompt at the expense of lower image quality.
800+
801+
Guidance-distilled models approximates true classifier-free guidance for `guidance_scale` > 1. Refer to
802+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
803803
num_images_per_prompt (`int`, *optional*, defaults to 1):
804804
The number of images to generate per prompt.
805805
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/flux/pipeline_flux_kontext_inpaint.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -989,7 +989,8 @@ def __call__(
989989
The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
990990
`text_encoder_2`. If not defined, `negative_prompt` is used in all the text-encoders.
991991
true_cfg_scale (`float`, *optional*, defaults to 1.0):
992-
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
992+
True classifier-free guidance (guidance scale) is enabled when `true_cfg_scale` > 1 and
993+
`negative_prompt` is provided.
993994
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
994995
The height in pixels of the generated image. This is set to 1024 by default for the best results.
995996
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
@@ -1015,11 +1016,11 @@ def __call__(
10151016
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
10161017
will be used.
10171018
guidance_scale (`float`, *optional*, defaults to 3.5):
1018-
Guidance scale as defined in [Classifier-Free Diffusion
1019-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
1020-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
1021-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
1022-
the text `prompt`, usually at the expense of lower image quality.
1019+
Embedded guidance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
1020+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
1021+
1022+
Guidance-distilled models approximates true classifier-free guidance for `guidance_scale` > 1. Refer to
1023+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
10231024
num_images_per_prompt (`int`, *optional*, defaults to 1):
10241025
The number of images to generate per prompt.
10251026
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -763,11 +763,11 @@ def __call__(
763763
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
764764
will be used.
765765
guidance_scale (`float`, *optional*, defaults to 3.5):
766-
Guidance scale as defined in [Classifier-Free Diffusion
767-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
768-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
769-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
770-
the text `prompt`, usually at the expense of lower image quality.
766+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
767+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
768+
769+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
770+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
771771
negative_prompt (`str` or `List[str]`, *optional*):
772772
The prompt or prompts not to guide the image generation. If not defined, one has to pass
773773
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is

src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -529,15 +529,14 @@ def __call__(
529529
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
530530
will be used.
531531
true_cfg_scale (`float`, *optional*, defaults to 1.0):
532-
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
532+
True classifier-free guidance (guidance scale) is enabled when `true_cfg_scale` > 1 and
533+
`negative_prompt` is provided.
533534
guidance_scale (`float`, defaults to `6.0`):
534-
Guidance scale as defined in [Classifier-Free Diffusion
535-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
536-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
537-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
538-
the text `prompt`, usually at the expense of lower image quality. Note that the only available
539-
HunyuanVideo model is CFG-distilled, which means that traditional guidance between unconditional and
540-
conditional latent is not applied.
535+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
536+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
537+
538+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
539+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
541540
num_videos_per_prompt (`int`, *optional*, defaults to 1):
542541
The number of images to generate per prompt.
543542
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):

src/diffusers/pipelines/sana/pipeline_sana_sprint.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -643,11 +643,11 @@ def __call__(
643643
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
644644
passed will be used. Must be in descending order.
645645
guidance_scale (`float`, *optional*, defaults to 4.5):
646-
Guidance scale as defined in [Classifier-Free Diffusion
647-
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
648-
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
649-
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
650-
the text `prompt`, usually at the expense of lower image quality.
646+
Embedded guiddance scale is enabled by setting `guidance_scale` > 1. Higher `guidance_scale` encourages
647+
a model to generate images more aligned with `prompt` at the expense of lower image quality.
648+
649+
Guidance-distilled models approximates true classifer-free guidance for `guidance_scale` > 1. Refer to
650+
the [paper](https://huggingface.co/papers/2210.03142) to learn more.
651651
num_images_per_prompt (`int`, *optional*, defaults to 1):
652652
The number of images to generate per prompt.
653653
height (`int`, *optional*, defaults to self.unet.config.sample_size):

tests/pipelines/wan/test_wan.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515
import gc
1616
import unittest
1717

18-
import numpy as np
1918
import torch
2019
from transformers import AutoTokenizer, T5EncoderModel
2120

@@ -29,9 +28,7 @@
2928
)
3029

3130
from ..pipeline_params import TEXT_TO_IMAGE_BATCH_PARAMS, TEXT_TO_IMAGE_IMAGE_PARAMS, TEXT_TO_IMAGE_PARAMS
32-
from ..test_pipelines_common import (
33-
PipelineTesterMixin,
34-
)
31+
from ..test_pipelines_common import PipelineTesterMixin
3532

3633

3734
enable_full_determinism()
@@ -127,11 +124,15 @@ def test_inference(self):
127124
inputs = self.get_dummy_inputs(device)
128125
video = pipe(**inputs).frames
129126
generated_video = video[0]
130-
131127
self.assertEqual(generated_video.shape, (9, 3, 16, 16))
132-
expected_video = torch.randn(9, 3, 16, 16)
133-
max_diff = np.abs(generated_video - expected_video).max()
134-
self.assertLessEqual(max_diff, 1e10)
128+
129+
# fmt: off
130+
expected_slice = torch.tensor([0.4525, 0.452, 0.4485, 0.4534, 0.4524, 0.4529, 0.454, 0.453, 0.5127, 0.5326, 0.5204, 0.5253, 0.5439, 0.5424, 0.5133, 0.5078])
131+
# fmt: on
132+
133+
generated_slice = generated_video.flatten()
134+
generated_slice = torch.cat([generated_slice[:8], generated_slice[-8:]])
135+
self.assertTrue(torch.allclose(generated_slice, expected_slice, atol=1e-3))
135136

136137
@unittest.skip("Test not supported")
137138
def test_attention_slicing_forward_pass(self):

0 commit comments

Comments
 (0)