-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[docs] update guidance_scale
docstring for guidance_distilled models.
#11935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -691,7 +691,11 @@ def __call__( | |||
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2. | |||
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting | |||
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to | |||
the text `prompt`, usually at the expense of lower image quality. | |||
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "mimic" mean it still produces the same result as true CFG? If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.
Maybe it'd be better to make a note of it on the Flux model card?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know the true CFG results I think. Maybe "resembles" is a better phrase?
If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.
I think docstrings are important and it can be confusing to users if we're not putting the right phrases here.
Maybe it'd be better to make a note of it on the Flux model card?
Here, I disagree. I think just clarifying it at the docstring level is more than sufficient w.r.t the info already available in the model card (for example, we already mention that Dev is a guidance-distlled model).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, maybe lets go with something like this then?
"Guidance-distilled models don't implement true classifier-free guidance and for guidance_scale > 1
, it only resembles it."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resembles/mimics is very confusing terminology to me.
doing CFG on a guidance-distilled model with cfg_scale=X and embedded_cfg_scale=Y is effectively/approximately the same as doing CFG on base model with cfg_scale=X*Y based on how the math works out. this can easily be validated by running inference with same seed and making sure product of true and embedded scale is the same value. the results will not be the exact same but will be similar-ish (because distilled model is a noisy approximator of base model outputs)
better not to change explantation imo. if we want to, we can just say it is guidance-distilled, and leave the interested user to google it and find necessary information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I hear both of you. I have had complaints over DMs multiple times regarding this.
How about:
Guidance-distilled models (such as …) don't implement true classifier-free guidance and for guidance_scale > 1, it approximates its effects. Refer to https://arxiv.org/abs/2210.03142 for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that should be fine. @a-r-r-o-w good with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. It's a bit weird here though that guidance_scale
actually means the embedded guidance scale, whereas we have true_cfg_scale
to actually mean guidance_scale
😞
Maybe clarifying this is very important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to that. @stevhliu how should go about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about? 😅
For guidance_scale
:
"Embedded guidance scale is enabled by setting guidance_scale
> 1. Higher guidance_scale
encourages a model to generate images more aligned with prompt
at the expense of lower image quality.
Guidance-distilled models approximates true classifier-free guidance for guidance_scale
> 1. Refer to the paper to learn more."
For true_cfg_scale
:
"True classifier-free guidance (guidance scale) is enabled when true_cfg_scale
> and
negative_prompt` is provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's go with that. Do you mind pushing to this PR directly? I can do it as well.
@@ -691,7 +691,11 @@ def __call__( | |||
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2. | |||
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting | |||
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to | |||
the text `prompt`, usually at the expense of lower image quality. | |||
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, maybe lets go with something like this then?
"Guidance-distilled models don't implement true classifier-free guidance and for guidance_scale > 1
, it only resembles it."
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model, | ||
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1` | ||
just mimics it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a duplicate sentence
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model, | |
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1` | |
just mimics it. | |
`guidance_scale > 1` just mimics it. |
@stevhliu thanks for the help! @a-r-r-o-w are we missing any other pipelines where this modification needs to be included? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Maybe the pipelines for HiDream, Chroma and HunyuanVideo need this change
Unrelated failing tests. |
What does this PR do?
Was feeling bored on the training journeys hence decided to open PRs. I am sure I am missing some other pipelines. What are those?