Pytorch implementation of conditional-VQVAE2 for generating high-fidelity multi-object images based on text captions.
original paper: Generating Diverse High-Fidelity Images with VQ-VAE-2
This implementation is optimized for the MS-COCO dataset (Captions 2014). Currently supports hierarchical VQVAE and PixelSNAIL.
The code was imported from ipynb notebook.
Credits: vqvae_prior.py code adapted from kamenbliznashki
- Downloaded MS-COCO captions dataset
- Pytorch >= 1.6
- GPU environment - the PixelSNAIL (vqvae_prior.py) is heavy to train especially on high-resolution images
- Train vqvae.py
- extract codes
- Train vqvae_prior.py
- Sample