Skip to content

hjl1013/SyncSDE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SyncSDE: A Probabilistic Framework for Diffusion Synchronization

This repository includes official implementation of "SyncSDE: A Probabilistic Framework for Diffusion Synchronization" (CVPR 2025).

Installation

This repository is tested with Python 3.9, CUDA 11.8.

pip install -r requirements.txt
pip install -e .

For 3d Mesh Texturing, install pytorch3d

conda install -c conda-forge cupy
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt270/download.html

If error occurs on numpy version, please re-install numpy via

conda install -c conda-forge numpy

and upgrade to version 1.22.4.

Running SyncSDE

Mask-based Text-to-Image Generation

python src/mask_based_T2I.py --prompt_path "data/prompt_mask_T2I.txt" \
                             --mask_path "data/mask_sample.png" \
                             --results_folder "./output/mask_T2I" \
                             --random_seed 0 --use_float_16 --inv_lambda 5.0

Text-driven Real Image Editing

First invert the real image using DDIM inversion.

python src/inversion.py --input_image "data/image.png" \
                        --results_folder "output/real_image_editing" \
                        --use_float_16

Then edit the real image.

python src/real_image_editing.py --inversion "./output/real_image_editing/inversion/image.pt" \
                                 --prompt "./output/real_image_editing/prompt/image.txt" \
                                 --results_folder "./output/real_image_editing" --task_name "cat2dog" \
                                 --random_seed 0 --use_float_16 --inv_lambda 5.0 

Wide Image Generation

python src/wide_image.py --prompt_path "./data/prompt_wide_image.txt" \
                         --results_folder "output/wide_image" \
                         --n_patches 13 --init_xt_from_zt \
                         --random_seed 0 --use_float_16 --inv_lambda 5.0

Ambiguous Image Generation

[NOTE] You may upgrade transformer version to use fp16 variant of the pretrained DeepFloyd-IF model.

To use the pretrained Deepfloyd-IF model, please log in to hugging face by following the instructions.

Then, run the script:

python src/ambiguous_image.py --prompt_path "data/prompt_ambiguous.txt" \
                              --transform "rotate_cw" \
                              --results_folder "./output/ambiguous_image" \
                              --random_seed 0 --use_float_16 --inv_lambda 5.0

You may use 5 types of transforms: 'rotate_cw', 'rotate_ccw', 'rotate_180', 'skew', and 'flip'.

Mesh Texturing

python src/mesh_texturing.py --prompt_path "./data/prompt_mesh_texturing.txt" \
                             --mesh_path "./data/clutch_bag.obj" \
                             --results_folder "./output/mesh_texturing" \
                             --random_seed 0 --inv_lambda 5.0

Acknowledgments

This repository is constructed based on Conditional Score Guidance and DeepFloyd-IF. The source image for text-driven real image editing is brought from the LAION-5B dataset.

About

SyncSDE: A Proabilistic Framwork for Diffusion Synchronization (CVPR 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •