Skip to content

cvlab-stonybrook/ZoomLDM

Repository files navigation

ZoomLDM: Latent Diffusion Model for multi-scale image generation

CVPR 2025

Setup

git clone https://github.com/cvlab-stonybrook/ZoomLDM/
conda activate zoomldm
pip install -r requirements.txt

The model weights are hosted on huggingface. The inference scripts we provide below download the model weights using huggingface hub.

Inference

Patch level generation



We demonstrate patch-level generation at any scale in sample_patches_brca.ipynb and sample_patches_naip.ipynb.

Large image generation



For large image generation, we use the proposed joint multi-scale sampling algorithm.

We provide an implementation of the algorithm in joint_multiscale.ipynb.

You can find more examples of large images here.

Super-resolution



Super-resolution uses the condition inversion algorithm proposed in the paper and with the joint multi-scale sampling to enforce the low-resolution constraint.

We prove an implementation in superres.ipynb.

Training

To train the model, you need to prepare a multi-scale dataset of {images, conditioning}.

Patch extraction

We use the codebase of DS-MIL to extract regions from the WSIs, first at the base 20x magnification. The patches range from 256x256 to 32768x32768 pixels. You might want to use a lower tissue threshold for larger images.

The following command will extract 1024x1024 patches at 20x:

python deepzoom_tiler.py -m 0 -b 20 -s 1024

Refer to this issue for satellite image patch extraction.

Feature extraction

Histopathology

We pre-extract UNI embeddings (conditioning) from the full resolution images in a patch-based manner. A 2048x2048 image would result in 64x256x256 patches -> 64x1024 UNI embedding.

We then resize images to 256x256, extract VAE features, and save them together with the UNI embeddings.

For NAIP, we use the pre-trained DINO-v2 ViT-Large (dinov2_vitl14_reg) checkpoint to extract embeddings.

Please take a look at the demo datasets: brca/naip or our dataloader scripts: brca/naip for more details.

Training

Create a config file similar to this, which specifies the dataset, model, and training parameters.

Then, run the training script:

python main.py -t --gpus 0,1,2 --base configs/zoomldm_brca.yaml

Bibtex

@InProceedings{Yellapragada_2025_CVPR,
  author = {Yellapragada, Srikar and Graikos, Alexandros and Triaridis, Kostas and Prasanna, Prateek and Gupta, Rajarsi and Saltz, Joel and Samaras, Dimitris},
  title = {ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
  month = {June},
  year = {2025},
  pages = {23453-23463}
}

About

CVPR 2025: 'ZoomLDM: Latent Diffusion Model for multi-scale image generation'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •