You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: unit4/README.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ The idea of using an existing model to 'teach' a new model can be extended to cr
25
25
26
26
NB: A distilled version of Stable Diffusion is due to be released fairly soon.
27
27
28
-
Key papers:
28
+
Key references:
29
29
-[PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS](http://arxiv.org/abs/2202.00512)
30
30
-[ON DISTILLATION OF GUIDED DIFFUSION MODELS](http://arxiv.org/abs/2210.03142)
31
31
@@ -44,7 +44,7 @@ Key training improvements:
44
44
- 'Knowledge Enhancement' - incorporating pre-trained image captioning and object detection models into the training process to create more informative captions and produce better performance ([ERNIE-ViLG 2.0](http://arxiv.org/abs/2210.15257))
45
45
- 'Mixture of Denoising Experts' (MoDE) - training different variants of the model ('experts') for different noise levels as illustrated in the image above from the [ERNIE-ViLG 2.0 paper](http://arxiv.org/abs/2210.15257).
46
46
47
-
Key Papers:
47
+
Key references:
48
48
-[Elucidating the Design Space of Diffusion-Based Generative Models](http://arxiv.org/abs/2206.00364)
49
49
-[eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers](http://arxiv.org/abs/2211.01324)
50
50
-[ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts](http://arxiv.org/abs/2210.15257)
@@ -86,7 +86,7 @@ _Still frames from [sample videos generated with Imagen Video](https://imagen.re
86
86
87
87
A video can be represented as a sequence of images, and the core ideas of diffusion models can be applied to these sequences. Recent work has focused on findingappropriate architectures (such as '3D UNets' which operate on entire sequences) and on working efficiently with video data. Since high-frame-rate video involves a lot more data than still images, current approaches tend to first generate low-resolution and low-frame-rate video and then apply spatial and temporal super-resolution to produce the final high-quality video outputs.
-[IMAGEN VIDEO: HIGH DEFINITION VIDEO GENERATION WITH DIFFUSION MODELS](https://imagen.research.google/video/paper.pdf)
92
92
@@ -118,7 +118,7 @@ The UNet architecture at the heart of many current diffusion models is also bein
118
118
119
119
With each new paper more efficient or performant approaches are being developed, and it may be some time before we see what peak performance looks like on these kinds of iterative refinement tasks. There is much more still to explore!
120
120
121
-
Key References
121
+
Key references
122
122
123
123
-[Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise](http://arxiv.org/abs/2208.09392)
124
124
-[Scalable Diffusion Models with Transformers (DiT)](https://www.wpeebles.com/DiT)
0 commit comments