Skip to content

[docs] update guidance_scale docstring for guidance_distilled models. #11935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 23, 2025

Conversation

sayakpaul
Copy link
Member

What does this PR do?

Was feeling bored on the training journeys hence decided to open PRs. I am sure I am missing some other pipelines. What are those?

@sayakpaul sayakpaul requested review from stevhliu and a-r-r-o-w July 15, 2025 19:59
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -691,7 +691,11 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "mimic" mean it still produces the same result as true CFG? If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.

Maybe it'd be better to make a note of it on the Flux model card?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the true CFG results I think. Maybe "resembles" is a better phrase?

If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.

I think docstrings are important and it can be confusing to users if we're not putting the right phrases here.

Maybe it'd be better to make a note of it on the Flux model card?

Here, I disagree. I think just clarifying it at the docstring level is more than sufficient w.r.t the info already available in the model card (for example, we already mention that Dev is a guidance-distlled model).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe lets go with something like this then?

"Guidance-distilled models don't implement true classifier-free guidance and for guidance_scale > 1, it only resembles it."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resembles/mimics is very confusing terminology to me.

doing CFG on a guidance-distilled model with cfg_scale=X and embedded_cfg_scale=Y is effectively/approximately the same as doing CFG on base model with cfg_scale=X*Y based on how the math works out. this can easily be validated by running inference with same seed and making sure product of true and embedded scale is the same value. the results will not be the exact same but will be similar-ish (because distilled model is a noisy approximator of base model outputs)

better not to change explantation imo. if we want to, we can just say it is guidance-distilled, and leave the interested user to google it and find necessary information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I hear both of you. I have had complaints over DMs multiple times regarding this.

How about:

Guidance-distilled models (such as …) don't implement true classifier-free guidance and for guidance_scale > 1, it approximates its effects. Refer to https://arxiv.org/abs/2210.03142 for more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be fine. @a-r-r-o-w good with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. It's a bit weird here though that guidance_scale actually means the embedded guidance scale, whereas we have true_cfg_scale to actually mean guidance_scale 😞

Maybe clarifying this is very important

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to that. @stevhliu how should go about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about? 😅

For guidance_scale:

"Embedded guidance scale is enabled by setting guidance_scale > 1. Higher guidance_scale encourages a model to generate images more aligned with prompt at the expense of lower image quality.

Guidance-distilled models approximates true classifier-free guidance for guidance_scale > 1. Refer to the paper to learn more."

For true_cfg_scale:

"True classifier-free guidance (guidance scale) is enabled when true_cfg_scale > andnegative_prompt` is provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's go with that. Do you mind pushing to this PR directly? I can do it as well.

@@ -691,7 +691,11 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe lets go with something like this then?

"Guidance-distilled models don't implement true classifier-free guidance and for guidance_scale > 1, it only resembles it."

Comment on lines 696 to 698
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model,
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1`
just mimics it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplicate sentence

Suggested change
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model,
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1`
just mimics it.
`guidance_scale > 1` just mimics it.

@sayakpaul
Copy link
Member Author

@stevhliu thanks for the help! @a-r-r-o-w are we missing any other pipelines where this modification needs to be included?

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Maybe the pipelines for HiDream, Chroma and HunyuanVideo need this change

@sayakpaul
Copy link
Member Author

Unrelated failing tests.

@sayakpaul sayakpaul merged commit 7ae6347 into main Jul 23, 2025
14 of 15 checks passed
@sayakpaul sayakpaul deleted the guidance-scale-docs branch July 23, 2025 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants