Skip to content

Implement Frequency-Decoupled Guidance (FDG) as a Guider #11976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dg845
Copy link
Contributor

@dg845 dg845 commented Jul 23, 2025

What does this PR do?

This PR implements frequency-decoupled guidance (FDG) (paper), a new guidance strategy, as a guider. The idea behind FDG is to decompose the CFG prediction into low- and high-frequency components and apply guidance separately to each via a CFG-style update (with separate guidance scales $w_{low}$ and $w_{high}$). The authors find that low guidance scales work better for the low-frequency components while high guidance scales work better for the high-frequency components (e.g. you should set $w_{low} < w_{high}$).

Fixes #11956.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a-r-r-o-w
@yiyixuxu
@Msadat97

@dg845
Copy link
Contributor Author

dg845 commented Jul 23, 2025

Some notes on the initial implementation:

  1. I have followed the paper implementation in Algorithm 2, which uses the kornia library to build a Laplacian pyramid as the frequency transform $\psi$. I'm not sure if this is already a dependency for diffusers; it happens to be in the dev environment I'm using, but doesn't appear to be explicitly pinned in setup.py.
  2. Right now, the FrequencyDecoupledGuidance class accepts guidance_scale_low and guidance_scale_high arguments in __init__ for $w_{low}$ and $w_{high}$, and similarly for other parameters such as parallel_weights_low/parallel_weights_high. Alternatively, we could accept a e.g. guidance_scales: Tuple[int] = [10.0, 5.0] argument for $w_{high} = 10$ and $w_{low} = 5$, and have all similar parameters (e.g. parallel_weights, guidance_rescale, etc.) be tuples of the same length. The latter approach is nice because it supports multiple frequency levels and makes it easier to decouple all parameters for each frequency level, but might be less usable, especially if using only 2 levels (low and high frequency) is the dominant use case.

@Msadat97
Copy link

Thank you for the quick implementation. Regarding your question, I believe it's cleaner to use tuples for the weights, as it allows users to seamlessly apply multiple levels when finer control over the generation is needed.

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dg845 Thanks for taking it up, implementation looks great!

What you suggested about tuples sounds good, let's do that. We can always update the implementation later if needed to simplify since modular guiders is experimental at the moment (plus, users can pass their own guider implementations so if someone wants to simplify, it will be quite easy to take your implementation and make the necessary modifications)

Let's not add kornia as a dependancy. Instead, we can do the same thing done in attention dispatcher (import only if package is available):

if _CAN_USE_FLASH_ATTN_3:

import math
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union

import kornia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a is_kornia_available to diffusers.utils.import_utils and import only if user already has it downloaded? A check could exist in __init__ as well so that if user tries to instantiate FDG guider, it fails if kornia isn't available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a is_kornia_available function to utils and added logic in the FDG guider to only import from kornia if available following the Flash Attention 3 example above.

@dg845 dg845 marked this pull request as ready for review July 24, 2025 00:36
@dg845 dg845 changed the title [WIP] Implement Frequency-Decoupled Guidance (FDG) as a Guider Implement Frequency-Decoupled Guidance (FDG) as a Guider Jul 24, 2025
@dg845
Copy link
Contributor Author

dg845 commented Jul 24, 2025

Hi @Msadat97, quick question: how should FDG interact with guidance rescaling (from https://arxiv.org/pdf/2305.08891)? Currently, I'm rescaling in frequency space for each frequency level, with different guidance_rescale values allowed for different levels, but would it make more sense to rescale after the FDG prediction is mapped back to data space (in which case there would only be one guidance_rescale value for all frequency levels)?

@Msadat97
Copy link

It seems more natural to perform a single rescaling at the end (after the FDG prediction) since FDG is meant to replace the CFG output. Rescaling in the frequency domain is also possible, but I can’t comment further as we haven’t tested FDG with guidance rescaling. Do you have any output comparisons for this?

@SahilCarterr
Copy link
Contributor

Can you share a code snippet how to use FDG . @dg845

@Msadat97
Copy link

@dg845 I noticed a mistake in the implementation. pred_cond and pred_uncond in the for loop should come from the Laplacian pyramid, but the current code uses the values in the data space. Could you please fix this? The correct approach is given in the paper:

Screenshot 2025-07-25 at 14 58 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frequency-Decoupled Guidance (FDG) for diffusion models
4 participants