Noise regularization in distilled DMs

Hi,

Thank you for this inspiring work!
I have a small question, if I may, regarding the noise regularization process.

In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with `mean = 0` and `var= 1`. However, , in distilled DMs, particularly in ADDs,  this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters  `[0,1]`. In fact, the variance of the model's output decreases as t approaches 0.

Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:


```
 timesteps 999.0: noise_pred mean = 0.0030574470292776823   |   noise_pred var = 0.978159487247467 

 25%|██▌       | 1/4 [00:01<00:03,  1.10s/it]

 timesteps 749.0: noise_pred mean = 0.0017684325575828552   |   noise_pred var = 0.9806435704231262 

 50%|████▌       | 2/4 [00:01<00:00,  1.98it/s]

timesteps 499.0: noise_pred mean = -0.0025877265725284815   |   noise_pred var = 0.877240777015686 

 75%|███████▌  | 3/4 [00:01<00:00,  2.91it/s]

 timesteps 249.0: noise_pred mean = -0.0006634604651480913   |   noise_pred var = 0.7107224464416504 

100%|██████████| 4/4 [00:01<00:00,  2.71it/s]
```


Considering this, do you think noise regularization is still relevant in ADDs?

Thank you in advance for your time!

Or

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Noise regularization in distilled DMs #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Noise regularization in distilled DMs #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions