Skip to content

AlmightySoulking/Natural-Discrete-Representation-Learning-Coding-a-VQ-VAE-from-scrach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Natural-Discrete-Representation-Learning-Coding-a-VQ-VAE-from-scrach

This repository contains a clean and educational PyTorch implementation of the paper:

Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
arXiv:1711.00937

image

The implementation demonstrates Vector Quantized Variational AutoEncoders (VQ-VAE) – a novel approach to learning discrete latent variables in deep generative models.


πŸ“Œ Introduction

Unlike standard VAEs that use continuous latent variables, VQ-VAE learns a discrete latent representation through vector quantization, enabling better compression and structure learning in the latent space. This model is especially useful in image, audio, and video generation.


πŸš€ Getting Started

Follow these steps to set up and run the project locally.

Prerequisites You'll need Python 3 and the following libraries installed:

-PyTorch -Torchvision -NumPy -Matplotlib -Seaborn -tqdm -torchview (optional, for visualizing the model)


🧠 Model Architecture

The model consists of:

  • Encoder: Maps the input image to a continuous latent space.
  • Codebook (Embedding Space): Discrete embedding vectors that act as learned latent variables.
  • Vector Quantizer: Replaces the continuous encoder output with the nearest codebook vector.
  • Decoder: Reconstructs the image from the quantized latent codes.

The loss function is composed of:

  • Reconstruction Loss: $\log p(x|z_q(x))$ ensures our output looks like the input and we will use Mean Squared Error
  • Codebook Loss: $||sg[z_e(x)] - e||_2^2$ will update our codevectors in the codebook by moving them closer to the output of the encoder $z$, while not drilling down into the $min$ function by placing a stop gradient on $z$
  • Commitment Loss: $\beta||z_e(x) - sg[e]||_2^2$ is exactly the opposite of our codebook loss, but ensures the output of the encoder is close to our codevectors, and has a weight $\beta$ to allow for some divergence.

Clone the repo:

git clone https://github.com/your-username/Neural-Discrete-Representation-Learning.git
cd Neural-Discrete-Representation-Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published