[DRAFT]: Refactor of diffusion samplers #1106

CharlelieLrt · 2025-09-05T19:31:56Z

PhysicsNeMo Pull Request

Description

Note: this PR is currently a draft and is open for feedback and suggestions. It will not be merged in its current form and might be broken down into smaller PRs.

Objectives

Overarching objective: refactor and improve all utilities and and models related to diffusion in order to consolidate these components into a framework. More precisely, these diffusion components should:

Support out of the box all current diffusion use-cases (CorrDiff, cBottle, StormCast, etc.)
Be composable
Be extendable
Be well-documented

PR objective: focuses on the EDM samplers (stochastic_sampler and deterministic_sampler). For these samplers, the objective is to be:

Agnostic to the diffusion model
Agnostic to the modality of the latent state x
Agnostic to the modality of the conditioning (for conditional diffusion models)
Support a large range of guidance for plug-and-play generation (e.g. DPS)
Support multiple implementation of multi-diffusion

Solutions

Refactored the stochastic_sampler functional interface into an object-oriented interface to facilitate future extensibility.
Model agnostic: relies on a callback whose signature is assumed invariant. To be able to satisfy this invariant constraint we also provide a surface adapter (i.e. a thin wrapper) that modifies the signature of any given Module to ensure compliance.
Latent state agnostic: simple refactors to avoid unnecessary assumptions on the shape of the latent state.
Conditioning agnostic: all conditioning variables are packed into a dict of tensors. The sampler never accesses the conditioning as the model is responsible for handling the conditioning ops.
Plug-and-play guidance: relies on callbacks passed to the sampler. Introduces a guidance API to facilitate creating these callbacks, and ensure compliance with the sampler requirements. For now two types of guidance are provided (model-based guidance for DPS, and data consistency guidance for inpainting/outpainting/channel infilling, etc.)
Multi-diffusion: TBD. For now the plan is to defer multi-diffusion ops to a dedicated model wrapper.

Remaining items

- Multi-diffusion: in their current implementation, the samplers use a patching object to extract patches from the latent state x; it also calls methods from the model to extract the global positional embedding of these patches. These strong assumptions on the model APIs are not compatible with the objective (1) above. A better solution might be to defer all multi-diffusion ops to a model wrapper that extract patches, get the positional embeddings ,etc...

- Guidance: only support pre-defined guidance, no mechanism to allow user-defined guidance. Extend the range of available off-the-shelf guidances.

- Model-based guidance: only support a model that processes batches, and that is implemented in pytorch and compatible with torch.autograd.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

ManshaP · 2025-09-23T17:14:05Z

physicsnemo/experimental/utils/diffusion/guidance.py

+            )
+        err1 = torch.abs((y - y_x0)) ** self.norm_ord  # (*_y,)
+
+        # Compute log-likelihood p(y|x_0_hat)


This is relatively specific to DPS, I believe. Other model-based guidance approaches may use a different parameterization of the time-dependent variance (rather than with gamma), or a different loss altogether (cBottle TC uses BinaryCrossEntropy

ManshaP · 2025-09-23T17:18:51Z

physicsnemo/experimental/utils/diffusion/samplers.py

+            for guidance, guidance_args, guidance_kwargs in zip(
+                guidances, guidances_args, guidances_kwargs
+            ):
+                if isinstance(guidance, ModelBasedGuidance):


There is a fundamental difference between kinds of guidance that use the clean and the noisy state (e.g. cBottle TC) so maybe worth making a separate class for guidance that does not require the clean state?

ManshaP · 2025-09-23T17:19:55Z

physicsnemo/experimental/utils/diffusion/samplers.py

+
+        # Guidance (e.g. posterior sampling, etc...)
+        guidance_sum = 0.0
+        if guidances:


I like this modularity/allowing for multiple guidance terms!

juliusberner · 2025-10-02T19:53:24Z

physicsnemo/experimental/utils/diffusion/samplers.py

+            # Activate gradient computation if needed for guidance
+            with torch.set_grad_enabled(req_grad):
+                if req_grad:
+                    x_hat_in = x_hat.clone().detach().requires_grad_(True)


nit: first detaching and then cloning is more efficient.

juliusberner · 2025-10-02T19:56:26Z

physicsnemo/experimental/utils/diffusion/samplers.py

+    return torch.cat(x_generated)
+
+
+class EDMStochasticSampler:


@CharlelieLrt do you plan to decouple the sampler from the choice of time discretization? E.g., we could use this Heun sampler with other time steps than the ones proposed in the EDM paper.

juliusberner · 2025-10-02T20:01:39Z

physicsnemo/experimental/utils/diffusion/guidance.py

+import torch
+
+
+class ModelBasedGuidance:


I would maybe call this DPSGuidance (to be more precise, this implementation also assumes Gaussian noise model), since there are several other guidance methods; see, e.g., https://arxiv.org/pdf/2503.11043 for an overview.

juliusberner · 2025-10-02T20:03:50Z

physicsnemo/experimental/utils/diffusion/adapter.py

+    x : torch.Tensor
+        The latent state of the diffusion model, typically of shape
+        :math:`(B, *)`.
+    sigma : torch.Tensor


I think more generally the input to the model is t (which just coincides with sigma for the VE schedule in the EDM formulation).

CharlelieLrt added 2 commits September 5, 2025 21:30

Initial draft of sampler with guidance

51373a6

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

Updates

0bb284f

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

CharlelieLrt self-assigned this Sep 19, 2025

ManshaP reviewed Sep 23, 2025

View reviewed changes

juliusberner reviewed Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT]: Refactor of diffusion samplers #1106

[DRAFT]: Refactor of diffusion samplers #1106

CharlelieLrt commented Sep 5, 2025 •

edited

Loading

Uh oh!

ManshaP Sep 23, 2025

Uh oh!

ManshaP Sep 23, 2025

Uh oh!

ManshaP Sep 23, 2025

Uh oh!

juliusberner Oct 2, 2025

Uh oh!

juliusberner Oct 2, 2025

Uh oh!

juliusberner Oct 2, 2025

Uh oh!

juliusberner Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DRAFT]: Refactor of diffusion samplers #1106

Are you sure you want to change the base?

[DRAFT]: Refactor of diffusion samplers #1106

Conversation

CharlelieLrt commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Objectives

Solutions

Remaining items

Checklist

Dependencies

Uh oh!

ManshaP Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

ManshaP Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

ManshaP Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CharlelieLrt commented Sep 5, 2025 •

edited

Loading