Skip to content

Commit f3e0911

Browse files
a-r-r-o-wstevhliu
andauthored
Improve Wan docstrings (#11689)
* improve docstrings for wan * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * make style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent 9154566 commit f3e0911

File tree

4 files changed

+46
-16
lines changed

4 files changed

+46
-16
lines changed

src/diffusers/pipelines/wan/pipeline_wan.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -388,8 +388,10 @@ def __call__(
388388
389389
Args:
390390
prompt (`str` or `List[str]`, *optional*):
391-
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
392-
instead.
391+
The prompt or prompts to guide the image generation. If not defined, pass `prompt_embeds` instead.
392+
negative_prompt (`str` or `List[str]`, *optional*):
393+
The prompt or prompts to avoid during image generation. If not defined, pass `negative_prompt_embeds`
394+
instead. Ignored when not using guidance (`guidance_scale` < `1`).
393395
height (`int`, defaults to `480`):
394396
The height in pixels of the generated image.
395397
width (`int`, defaults to `832`):
@@ -434,8 +436,9 @@ def __call__(
434436
The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
435437
will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
436438
`._callback_tensor_inputs` attribute of your pipeline class.
437-
autocast_dtype (`torch.dtype`, *optional*, defaults to `torch.bfloat16`):
438-
The dtype to use for the torch.amp.autocast.
439+
max_sequence_length (`int`, defaults to `512`):
440+
The maximum sequence length of the text encoder. If the prompt is longer than this, it will be
441+
truncated. If the prompt is shorter, it will be padded to this length.
439442
440443
Examples:
441444

src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -562,12 +562,10 @@ def __call__(
562562
The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
563563
will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
564564
`._callback_tensor_inputs` attribute of your pipeline class.
565-
max_sequence_length (`int`, *optional*, defaults to `512`):
566-
The maximum sequence length of the prompt.
567-
shift (`float`, *optional*, defaults to `5.0`):
568-
The shift of the flow.
569-
autocast_dtype (`torch.dtype`, *optional*, defaults to `torch.bfloat16`):
570-
The dtype to use for the torch.amp.autocast.
565+
max_sequence_length (`int`, defaults to `512`):
566+
The maximum sequence length of the text encoder. If the prompt is longer than this, it will be
567+
truncated. If the prompt is shorter, it will be padded to this length.
568+
571569
Examples:
572570
573571
Returns:

src/diffusers/pipelines/wan/pipeline_wan_vace.py

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -687,8 +687,33 @@ def __call__(
687687
688688
Args:
689689
prompt (`str` or `List[str]`, *optional*):
690-
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
690+
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`
691691
instead.
692+
negative_prompt (`str` or `List[str]`, *optional*):
693+
The prompt or prompts not to guide the image generation. If not defined, one has to pass
694+
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
695+
less than `1`).
696+
video (`List[PIL.Image.Image]`, *optional*):
697+
The input video or videos to be used as a starting point for the generation. The video should be a list
698+
of PIL images, a numpy array, or a torch tensor. Currently, the pipeline only supports generating one
699+
video at a time.
700+
mask (`List[PIL.Image.Image]`, *optional*):
701+
The input mask defines which video regions to condition on and which to generate. Black areas in the
702+
mask indicate conditioning regions, while white areas indicate regions for generation. The mask should
703+
be a list of PIL images, a numpy array, or a torch tensor. Currently supports generating a single video
704+
at a time.
705+
reference_images (`List[PIL.Image.Image]`, *optional*):
706+
A list of one or more reference images as extra conditioning for the generation. For example, if you
707+
are trying to inpaint a video to change the character, you can pass reference images of the new
708+
character here. Refer to the Diffusers [examples](https://github.com/huggingface/diffusers/pull/11582)
709+
and original [user
710+
guide](https://github.com/ali-vilab/VACE/blob/0897c6d055d7d9ea9e191dce763006664d9780f8/UserGuide.md)
711+
for a full list of supported tasks and use cases.
712+
conditioning_scale (`float`, `List[float]`, `torch.Tensor`, defaults to `1.0`):
713+
The conditioning scale to be applied when adding the control conditioning latent stream to the
714+
denoising latent stream in each control layer of the model. If a float is provided, it will be applied
715+
uniformly to all layers. If a list or tensor is provided, it should have the same length as the number
716+
of control layers in the model (`len(transformer.config.vace_layers)`).
692717
height (`int`, defaults to `480`):
693718
The height in pixels of the generated image.
694719
width (`int`, defaults to `832`):
@@ -733,8 +758,9 @@ def __call__(
733758
The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
734759
will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
735760
`._callback_tensor_inputs` attribute of your pipeline class.
736-
autocast_dtype (`torch.dtype`, *optional*, defaults to `torch.bfloat16`):
737-
The dtype to use for the torch.amp.autocast.
761+
max_sequence_length (`int`, defaults to `512`):
762+
The maximum sequence length of the text encoder. If the prompt is longer than this, it will be
763+
truncated. If the prompt is shorter, it will be padded to this length.
738764
739765
Examples:
740766

src/diffusers/pipelines/wan/pipeline_wan_video2video.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,7 @@ def __call__(
508508
509509
Args:
510510
prompt (`str` or `List[str]`, *optional*):
511-
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
511+
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`
512512
instead.
513513
height (`int`, defaults to `480`):
514514
The height in pixels of the generated image.
@@ -525,6 +525,8 @@ def __call__(
525525
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
526526
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
527527
the text `prompt`, usually at the expense of lower image quality.
528+
strength (`float`, defaults to `0.8`):
529+
Higher strength leads to more differences between original image and generated video.
528530
num_videos_per_prompt (`int`, *optional*, defaults to 1):
529531
The number of images to generate per prompt.
530532
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
@@ -554,8 +556,9 @@ def __call__(
554556
The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
555557
will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
556558
`._callback_tensor_inputs` attribute of your pipeline class.
557-
autocast_dtype (`torch.dtype`, *optional*, defaults to `torch.bfloat16`):
558-
The dtype to use for the torch.amp.autocast.
559+
max_sequence_length (`int`, defaults to `512`):
560+
The maximum sequence length of the text encoder. If the prompt is longer than this, it will be
561+
truncated. If the prompt is shorter, it will be padded to this length.
559562
560563
Examples:
561564

0 commit comments

Comments
 (0)