Skip to content

tongdaxu/pytorch-image-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Image Tokenizer

  • A fork of https://github.com/Stability-AI/generative-models, with focus on tokenizers
    • Practical size implementation and training code for popular tokenizers, such as VQ, FSQ, LFQ, BSQ
    • Both stable diffusion unet and bsq vit backbone support
    • With pre-trained model and benchmark on ImageNet 256x256

Usage

Prequisites

  • dependency in environment.yaml
    conda env create --file=environment.yaml
    conda activate tokenizer

Installation

  • from source
    pip install .

Prepare your dataset

  • It is recommend to list the dataset in advanced using
    python scripts/create_dataset_list.py --root $PATH_TO_DATASET_FOLDER --ext $IMAGE_EXTENSION --out $PATH_TO_OUTFILE
  • It is not mandatory, just speed up training

Training Tokenizers using default config

  • modify the yaml file according to your system, pay special attention to "trainer-device", "trainer-num_nodes", "data-train-params-root"

  • Gaussian VAE with stable diffusion UNet

    python main.py --config sd3unet_gaussian_kl_0.64.yaml --wandb
  • FSQ with stable diffusion UNet

    python main.py --config sd3unet_fsq_16.yaml --wandb
  • LFQ with stable diffusion UNet

    python main.py --config sd3unet_lfq_16.yaml --wandb
  • check ./configs/ for more

Evaluating Tokenizers

  • usage
    python -m torch.distributed.launch --standalone --use-env \
    --nproc-per-node=8 eval.py \
    --bs=32 \
    --base=$PATH_TO_YAML_CONFIG \
    --ckpt=$PATH_TO_CKPT \
    --dataset=$PATH_TO_DATASET_FOLDER

Pre-trained models and benchmark

spec config model PSNR SSIM LPIPS rFID
LFQ 2^16x1024 sd3unet_lfq_16.yaml sd3unet_lfq_16.ckpt 22.65 0.635 0.141 3.523
FSQ 2^16x1024 sd3unet_fsq_16.yaml sd3unet_fsq_16.ckpt 26.87 0.785 0.072 1.161
BSQ 2^16x1024 sd3unet_bsq_16.yaml sd3unet_bsq_16.ckpt 25.62 0.754 0.086 1.080

Reference

About

Practical implementation and benchmark of image tokenizers, such as VQ, LFQ, FSQ and BSQ

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages