Skip to content

Commit c165cf0

Browse files
roughing out structure
1 parent 9273ff2 commit c165cf0

File tree

1 file changed

+61
-11
lines changed

1 file changed

+61
-11
lines changed

unit4/README.md

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,81 @@
11
# Unit 4: Going Further with Diffusion Models
22

3-
Welcome to Unit 4 of the Hugging Face Diffusion Models Course!
3+
Welcome to Unit 4 of the Hugging Face Diffusion Models Course! In this unit we will look at some of the many improvements and extensions to diffusion models appearing in the latest research. It will be less code-heavy than previous units have been, and is designed to give you a jumping off point for further research and set up for possible additional units in the future.
44

55
## Start this Unit :rocket:
66

77
Here are the steps for this unit:
88

9-
- Make sure you've [signed up for this course](https://huggingface.us17.list-manage.com/subscribe?u=7f57e683fa28b51bfc493d048&id=ef963b4162) so that you can be notified when new material is released
9+
- Make sure you've [signed up for this course](https://huggingface.us17.list-manage.com/subscribe?u=7f57e683fa28b51bfc493d048&id=ef963b4162) so that you can be notified when additional units are added to the course
1010
- Read through the material below for an overview of the different topics covered in this unit
1111
- Dive deeper into any specific topics with the linked videos and resources
1212
- Complete the [TODO some sort of exercise/capstone project]
1313

14-
1514
:loudspeaker: Don't forget to join the [Discord](https://huggingface.co/join/discord), where you can discuss the material and share what you've made in the `#diffusion-models-class` channel.
1615

1716
## Introduction
1817

1918
By the end of Unit 3...
2019

21-
Things to cover
22-
- Better Training (different aspect ratios, additional losses)
23-
- Fancier models (MoE, versatile difusion for more conditioning)
24-
- Faster Inference (Distillation)
25-
- Maybe a rundown of specific models (DALL-E 2, Imagen, eDiffi)
26-
- Video
27-
- Audio
28-
- Moving beyond diffusion (MaskGIT/MUSE, Paella and co)
20+
## Faster Sampling via Distillation
21+
22+
Introduce the idea... The core mechanism is illustrated in this diagram from the [paper that introduced the idea](http://arxiv.org/abs/2202.00512):
23+
24+
![image](https://user-images.githubusercontent.com/6575163/211016659-7dac24a5-37e2-45f9-aba8-0c573937e7fb.png)
25+
26+
The idea of using an existing model to 'teach' a new model can be extended to create guided models where the classifier-free guidance technique is used by the teacher model and the student model must learn to produce an equivalent output in a single step based on an additional input specifying the targeted guidance scale. This further reduces the number of model evaluations required to produce high-quality samples.
27+
28+
Key papers:
29+
- [PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS](http://arxiv.org/abs/2202.00512)
30+
- [ON DISTILLATION OF GUIDED DIFFUSION MODELS](http://arxiv.org/abs/2210.03142)
31+
32+
## Training Improvements
33+
34+
![image](https://user-images.githubusercontent.com/6575163/211021220-e87ca296-cf15-4262-9359-7aeffeecbaae.png)
35+
_Figure 2 from the [ERNIE-ViLG 2.0 paper](http://arxiv.org/abs/2210.15257)_
36+
37+
There have been a number of additional tricks developed to improve training. A few key ones are
38+
- Tuning the noise schedule, loss weighting and sampling trajectories (Karras et al)
39+
- Training on diverse aspect rations [TODO link patrick's talk from the launch event]
40+
- Cascading diffusion models, training one model at low resolution and then one or more super-res models (D2, Imagen, eDiffi)
41+
- Rich text embeddings (Imagen) or multiple types of conditioning (eDiffi)
42+
- Incorporating pre-trained image captioning and object detection models into the training process to create more informative captions and produce better performance in a process known as 'knowledge enhancement' (ERNIE-ViLG 2.0)
43+
- MoE training different variants of the model ('experts') for different noise levels...
44+
45+
Key Papers:
46+
- [Elucidating the Design Space of Diffusion-Based Generative Models](http://arxiv.org/abs/2206.00364)
47+
- [eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers](http://arxiv.org/abs/2211.01324)
48+
- [ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts](http://arxiv.org/abs/2210.15257)
49+
- [Imagen][TODO]
50+
51+
## Inference Improvements
52+
53+
- eDiffi paint with words,
54+
- Image editing with diffusion models video
55+
- Textual inversion, null text inversion
56+
- ??
57+
58+
## Video
59+
60+
Video diffusion, Imagen Video
61+
62+
Key Papers:
63+
- [Video Diffusion Models](https://video-diffusion.github.io/)
64+
- [IMAGEN VIDEO: HIGH DEFINITION VIDEO GENERATION WITH DIFFUSION MODELS](https://imagen.research.google/video/paper.pdf)
65+
66+
## Audio
67+
68+
- Riffusion (and possibly notebook on the idea)
69+
- Non-spectrogram paper
70+
71+
## New Architectures and Approaches
72+
73+
Transformer in place of UNet (DiT)
74+
75+
Recurrent Interface Networks (https://arxiv.org/pdf/2212.11972.pdf)
76+
77+
MUSE/MaskGIT and Paella
78+
2979

3080
## Project Time
3181

0 commit comments

Comments
 (0)