NaiveDiffusion is a self-learning project focused on implementing a diffusion model for image generation. This implementation closely follows the original algorithms presented in the DDPM paper and adopts the U-Net architecture from the U-Net paper, enhanced with a multi-head attention block inserted between convolution and downsampling/upsampling operations.
The diffusion model minimizes the KL divergence between the forward noise process ( q(x_t \mid x_{t-1}) ) and the reverse process ( p(x_{t-1} \mid x_t) ). Practically, this is done by minimizing the mean squared error (MSE) between the generated noise and the predicted noise.
The predicted noise is output by a parameterized U-Net, which takes in a noised image along with its corresponding timestep. The forward noise process is computed using formulas defined in the DDPM paper, and is encapsulated in the Diffusion
class.
The repo consists of the following key Python files:
-
train.py
:
Implements the training loop for the diffusion model using selected datasets such as CIFAR-10 or MNIST. -
model/diffusion.py
:
Defines theDiffusion
class, which handles the forward noise process and denoising during image generation. -
model/module.py
:
Implements theUNet
class, which learns the mapping function from Gaussian noise space to the training data distribution. This version includes a multi-head attention block for improved feature learning.
-
Clone the repository:
git clone https://github.com/liuximeng09/NaiveDiffusion.git cd NaiveDiffusion
-
Install dependencies:
pip install -r requirements.txt
-
Train the model:
python train.py # remember to select your dataset, autodownload