ZoomLDM: Latent Diffusion Model for multi-scale image generation

CVPR 2025

Setup

git clone https://github.com/cvlab-stonybrook/ZoomLDM/
conda activate zoomldm
pip install -r requirements.txt

The model weights are hosted on huggingface. The inference scripts we provide below download the model weights using huggingface hub.

Inference

Patch level generation

We demonstrate patch-level generation at any scale in sample_patches_brca.ipynb and sample_patches_naip.ipynb.

Large image generation

For large image generation, we use the proposed joint multi-scale sampling algorithm.

We provide an implementation of the algorithm in joint_multiscale.ipynb.

You can find more examples of large images here.

Super-resolution

Super-resolution uses the condition inversion algorithm proposed in the paper and with the joint multi-scale sampling to enforce the low-resolution constraint.

We prove an implementation in superres.ipynb.

Training

To train the model, you need to prepare a multi-scale dataset of {images, conditioning}.

Patch extraction

We use the codebase of DS-MIL to extract regions from the WSIs, first at the base 20x magnification. The patches range from 256x256 to 32768x32768 pixels. You might want to use a lower tissue threshold for larger images.

The following command will extract 1024x1024 patches at 20x:

python deepzoom_tiler.py -m 0 -b 20 -s 1024

Refer to this issue for satellite image patch extraction.

Feature extraction

Histopathology

We pre-extract UNI embeddings (conditioning) from the full resolution images in a patch-based manner. A 2048x2048 image would result in 64x256x256 patches -> 64x1024 UNI embedding.

We then resize images to 256x256, extract VAE features, and save them together with the UNI embeddings.

For NAIP, we use the pre-trained DINO-v2 ViT-Large (dinov2_vitl14_reg) checkpoint to extract embeddings.

Please take a look at the demo datasets: brca/naip or our dataloader scripts: brca/naip for more details.

Training

Create a config file similar to this, which specifies the dataset, model, and training parameters.

Then, run the training script:

python main.py -t --gpus 0,1,2 --base configs/zoomldm_brca.yaml

Bibtex

@InProceedings{Yellapragada_2025_CVPR,
  author = {Yellapragada, Srikar and Graikos, Alexandros and Triaridis, Kostas and Prasanna, Prateek and Gupta, Rajarsi and Saltz, Joel and Samaras, Dimitris},
  title = {ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
  month = {June},
  year = {2025},
  pages = {23453-23463}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
configs		configs
large_image_gen		large_image_gen
ldm		ldm
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZoomLDM: Latent Diffusion Model for multi-scale image generation

CVPR 2025

Setup

Inference

Patch level generation

Large image generation

Super-resolution

Training

Patch extraction

Feature extraction

Histopathology

Training

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

cvlab-stonybrook/ZoomLDM

Folders and files

Latest commit

History

Repository files navigation

ZoomLDM: Latent Diffusion Model for multi-scale image generation

CVPR 2025

Setup

Inference

Patch level generation

Large image generation

Super-resolution

Training

Patch extraction

Feature extraction

Histopathology

Training

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages