You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: unit4/README.md
+30-5Lines changed: 30 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ Here are the steps for this unit:
9
9
- Make sure you've [signed up for this course](https://huggingface.us17.list-manage.com/subscribe?u=7f57e683fa28b51bfc493d048&id=ef963b4162) so that you can be notified when additional units are added to the course
10
10
- Read through the material below for an overview of the different topics covered in this unit
11
11
- Dive deeper into any specific topics with the linked videos and resources
12
-
-Complete the [TODO some sort of exercise/capstone project]
12
+
-Explore the demo notebooks and then read the 'What Next' section for some project suggestions
13
13
14
14
:loudspeaker: Don't forget to join the [Discord](https://huggingface.co/join/discord), where you can discuss the material and share what you've made in the `#diffusion-models-class` channel.
15
15
@@ -19,6 +19,8 @@ Progressive distillation is a technique for taking an existing diffusion model a
_Progressive Distillation illustrated (from the [paper](http://arxiv.org/abs/2202.00512))_
23
+
22
24
The idea of using an existing model to 'teach' a new model can be extended to create guided models where the classifier-free guidance technique is used by the teacher model and the student model must learn to produce an equivalent output in a single step based on an additional input specifying the targeted guidance scale. This further reduces the number of model evaluations required to produce high-quality samples. [This video](https://www.youtube.com/watch?v=ZXuK6IRJlnk) gives an overview of the approach.
23
25
24
26
NB: A distilled version of Stable Diffusion is due to be released fairly soon.
_A spectrogram generated with Riffusion ([image source](https://www.riffusion.com/about))_
95
98
96
99
While there has been some work on generating audio directly using diffusion models (e.g. [DiffWave](https://arxiv.org/abs/2009.09761)) the most successful approach so far has been to convert the audio signal into something called a spectrogram, which effectively 'encodes' the audio as a 2D "image" which can then be used to train the kinds of diffusion models we're used to using for image generation. The resulting generated spectrograms can then be converted into audio using existing methods. This approach is behind the recently-released Riffusion, which fine-tuned Stable Diffusion to generate spectrograms conditioned on text - [try it out here](https://www.riffusion.com/).
@@ -101,13 +104,35 @@ Key references:
101
104
102
105
## New Architectures and Approaches - Towards 'Iterative Refinement'
_Figure 1 from the [Cold Diffusion](http://arxiv.org/abs/2208.09392) paper_
110
+
111
+
We are slowly moving beyond the original narrow definition of "diffusion" models and towards a more general class of models that perform **iterative refinement**, where some form of corruption (like the addition of gaussian noise in the forward diffusion process) is gradually reversed to generate samples. The 'Cold Diffusion' paper demonstrated that many other types of corruption can be iteratively 'undone' to generate images (examples shown above), and recent transformer-based approaches have demonstrated the effectiveness of token replacement or masking as a noising strategy.
_Pipeline from [MaskGIT](http://arxiv.org/abs/2202.04200)_
116
+
117
+
The UNet architecture at the heart of many current diffusion models is also being replaced with different alternatives, most notably various transformer-based architectures. In [Scalable Diffusion Models with Transformers (DiT)](https://www.wpeebles.com/DiT) a transformer is used in place of the UNet for a fairly standard diffusion model approach, with excellent results. [Recurrent Interface Networks](https://arxiv.org/pdf/2212.11972.pdf) applies a novel transformer-based architecture and training strategy in pursuit of additional efficiency. [MaskGIT](http://arxiv.org/abs/2202.04200) and [MUSE]http://arxiv.org/abs/2301.00704 use transformer models to work with tokenized representations of images, although the [Paella](https://arxiv.org/abs/2211.07292v1) model demonstrates that a UNet can also be applied successfully to these token-based regimes too.
118
+
119
+
With each new paper more efficient or performant approaches are being developed, and it may be some time before we see what peak performance looks like on these kinds of iterative refinement tasks. There is much more still to explore!
We've covered a LOT of different ideas in this unit, many of which deserve much more detailed follow-on lessons in the future. For now, here are two demo notebooks for you to get hands-on with a couple of the ideas discussed above:
133
+
- TODO link Image Editing with DDIM Inversion notebook
0 commit comments