-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Implement Frequency-Decoupled Guidance (FDG) as a Guider #11976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Some notes on the initial implementation:
|
Thank you for the quick implementation. Regarding your question, I believe it's cleaner to use tuples for the weights, as it allows users to seamlessly apply multiple levels when finer control over the generation is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dg845 Thanks for taking it up, implementation looks great!
What you suggested about tuples sounds good, let's do that. We can always update the implementation later if needed to simplify since modular guiders is experimental at the moment (plus, users can pass their own guider implementations so if someone wants to simplify, it will be quite easy to take your implementation and make the necessary modifications)
Let's not add kornia as a dependancy. Instead, we can do the same thing done in attention dispatcher (import only if package is available):
if _CAN_USE_FLASH_ATTN_3: |
import math | ||
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union | ||
|
||
import kornia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a is_kornia_available
to diffusers.utils.import_utils
and import only if user already has it downloaded? A check could exist in __init__
as well so that if user tries to instantiate FDG guider, it fails if kornia isn't available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a is_kornia_available
function to utils
and added logic in the FDG guider to only import from kornia
if available following the Flash Attention 3 example above.
Hi @Msadat97, quick question: how should FDG interact with guidance rescaling (from https://arxiv.org/pdf/2305.08891)? Currently, I'm rescaling in frequency space for each frequency level, with different |
It seems more natural to perform a single rescaling at the end (after the FDG prediction) since FDG is meant to replace the CFG output. Rescaling in the frequency domain is also possible, but I can’t comment further as we haven’t tested FDG with guidance rescaling. Do you have any output comparisons for this? |
Can you share a code snippet how to use FDG . @dg845 |
@dg845 I noticed a mistake in the implementation. ![]() |
What does this PR do?
This PR implements frequency-decoupled guidance (FDG) (paper), a new guidance strategy, as a guider. The idea behind FDG is to decompose the CFG prediction into low- and high-frequency components and apply guidance separately to each via a CFG-style update (with separate guidance scales$w_{low}$ and $w_{high}$ ). The authors find that low guidance scales work better for the low-frequency components while high guidance scales work better for the high-frequency components (e.g. you should set $w_{low} < w_{high}$ ).
Fixes #11956.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@a-r-r-o-w
@yiyixuxu
@Msadat97