This repository contains a clean and educational PyTorch implementation of the paper:
Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
arXiv:1711.00937

The implementation demonstrates Vector Quantized Variational AutoEncoders (VQ-VAE) β a novel approach to learning discrete latent variables in deep generative models.
Unlike standard VAEs that use continuous latent variables, VQ-VAE learns a discrete latent representation through vector quantization, enabling better compression and structure learning in the latent space. This model is especially useful in image, audio, and video generation.
Follow these steps to set up and run the project locally.
Prerequisites You'll need Python 3 and the following libraries installed:
-PyTorch -Torchvision -NumPy -Matplotlib -Seaborn -tqdm -torchview (optional, for visualizing the model)
The model consists of:
- Encoder: Maps the input image to a continuous latent space.
- Codebook (Embedding Space): Discrete embedding vectors that act as learned latent variables.
- Vector Quantizer: Replaces the continuous encoder output with the nearest codebook vector.
- Decoder: Reconstructs the image from the quantized latent codes.
The loss function is composed of:
-
Reconstruction Loss:
$\log p(x|z_q(x))$ ensures our output looks like the input and we will use Mean Squared Error -
Codebook Loss:
$||sg[z_e(x)] - e||_2^2$ will update our codevectors in the codebook by moving them closer to the output of the encoder$z$ , while not drilling down into the$min$ function by placing a stop gradient on$z$ -
Commitment Loss:
$\beta||z_e(x) - sg[e]||_2^2$ is exactly the opposite of our codebook loss, but ensures the output of the encoder is close to our codevectors, and has a weight$\beta$ to allow for some divergence.
Clone the repo:
git clone https://github.com/your-username/Neural-Discrete-Representation-Learning.git
cd Neural-Discrete-Representation-Learning