Skip to content

Noise regularization in distilled DMs #11

@OrGreenberg

Description

@OrGreenberg

Hi,

Thank you for this inspiring work!
I have a small question, if I may, regarding the noise regularization process.

In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with mean = 0 and var= 1. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters [0,1]. In fact, the variance of the model's output decreases as t approaches 0.

Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:

 timesteps 999.0: noise_pred mean = 0.0030574470292776823   |   noise_pred var = 0.978159487247467 

 25%|██▌       | 1/4 [00:01<00:03,  1.10s/it]

 timesteps 749.0: noise_pred mean = 0.0017684325575828552   |   noise_pred var = 0.9806435704231262 

 50%|████▌       | 2/4 [00:01<00:00,  1.98it/s]

timesteps 499.0: noise_pred mean = -0.0025877265725284815   |   noise_pred var = 0.877240777015686 

 75%|███████▌  | 3/4 [00:01<00:00,  2.91it/s]

 timesteps 249.0: noise_pred mean = -0.0006634604651480913   |   noise_pred var = 0.7107224464416504 

100%|██████████| 4/4 [00:01<00:00,  2.71it/s]

Considering this, do you think noise regularization is still relevant in ADDs?

Thank you in advance for your time!

Or

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions