GitHub - Joyies/TVT: [ICCV2025] Official code for Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

🚩 Accepted by ICCV2025

¹The Hong Kong Polytechnic University, ²OPPO Research Institute

⏰ Update

2025.7.29: Paper is released on ArXiv.
2025.7.28: The training code and testing code are released.
2025.7.24: The repo is released.

⭐ If TVT is helpful to your images or projects, please help star this repo. Thanks! 🤗

TODO

Release the code for inference.
Update the code for training.
fp16 VAED4.

⚙ Dependencies and Installation

## git clone this repository
git clone https://github.com/Joyies/TVT.git
cd TVT


# create an environment
conda create -n TVT python=3.10
conda activate TVT
pip install --upgrade pip
pip install -r requirements.txt

🏂 Quick Inference

Real-world Image Super-resolution

Step 1: Download the pretrained models

Download the pretrained SD-2.1-base models from
Download the model weights (VAED4, TVT model, TVTUNet, DAPE, and RAM) from and put the model weights in the ckp/:

Step 2: Prepare testing data and run testing command

You can modify input_path and output_path to run testing command. The input_path is the path of the test image and the output_path is the path where the output images are saved.

python TVT/inferences/inference.py \
--input_image input_path \
--output_dir output_path \
--pretrained_path ckp/model_TVT.pkl \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-1-base \
--pretrained_unet_path ckp/TVTUNet \
--vae4d_path ckp/vae.ckpt \
--ram_ft_path ckp/DAPE.pth \
--negprompt 'dotted, noise, blur, lowres, smooth' \
--prompt 'clean, high-resolution, 8k' \
--upscale 4 \
--time_step 1

or

bash scripts/test/test_realsr.sh

We also provide the tile code to save the GPU memory for inference. You can run the running command and change the tile size and stride according to the VRAM of your device.

python TVT/inferences/inference_tile.py \
--input_image input_path \
--output_dir output_path \
--pretrained_path ckp/model_TVT.pkl \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-1-base \
--pretrained_unet_path ckp/TVTUNet \
--vae4d_path ckp/vae.ckpt \
--ram_ft_path ckp/DAPE.pth \
--negprompt 'dotted, noise, blur, lowres, smooth' \
--prompt 'clean, high-resolution, 8k' \
--upscale 4 \
--time_step 1 \
--tiled_size 96 \
--tiled_overlap 32

🚄 Training Phase

Train VAED4 on the OpenImage dataset and LSDIR dataset.

Step1: Prepare training data

Download the OpenImage dataset and LSIDR dataset. For each image in the LSDIR dataset, crop multiple 512×512 image patches using a sliding window with a stride of 64 pixels;

Step2: Train VAED4

The LDM code is used to train VAED4.

Train TVTSR on the Real-ISR datasets

Step1: Prepare training data

Download the LSIDR dataset and the first 10k FFHQ dataset. Subsequently, perform data augmentation on the training dataset. Specifically, for each image in the LSDIR dataset, crop multiple 512×512 image patches using a sliding window with a stride of 64 pixels; for the FFHQ dataset, directly resize all images to 512×512.

Step2: Train Real-ISR Model

Download VAED4, TVTUNet, and RAM models, and put these models into ckp/.

Start training.

accelerate launch --gpu_ids=0,1,2,3, --num_processes=4 TVT/train_TVTSR/train.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
--pretrained_model_name_or_path_vsd="stabilityai/stable-diffusion-2-1-base" \
--pretrained_unet_path='ckp/TVTUNet' \
--vae4d_path='ckp/vae.ckpt' \
--dataset_folder="data_path" \
--testdataset_folder="test_path" \
--resolution=512 \
--learning_rate=5e-5 \
--train_batch_size=2 \
--gradient_accumulation_steps=2 \
--enable_xformers_memory_efficient_attention \
--eval_freq 500 \
--checkpointing_steps 500 \
--mixed_precision='fp16' \
--report_to "tensorboard" \
--output_dir="output_path" \
--lora_rank_unet_vsd=4 \
--lora_rank_unet=4 \
--lambda_lpips=2 \
--lambda_l2=1 \
--lambda_vsd=1 \
--lambda_vsd_lora=1 \
--min_dm_step_ratio=0.25 \
--max_dm_step_ratio=0.75 \
--use_vae_encode_lora \
--align_method="adain" \
--use_online_deg \
--deg_file_path="params_TVT.yml" \
--negative_prompt='painting, oil painting, illustration, drawing, art, sketch, oil painting, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth' \
--test_image_prep='no_resize' \
--time_step=1 \
--tracker_project_name "experiment_track_name"

or

bash scripts/train/train.sh

🔗 Citations

If our code helps your research or work, please consider citing our paper. The following are BibTeX references:

@article{yi2025fine,
  title={Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training},
  author={Yi, Qiaosi and Li, Shuai and Wu, Rongyuan and Sun, Lingchen and Wu, Yuhui and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  year={2025}
}

©️ License

This project is released under the Apache 2.0 license.

📧 Contact

If you have any questions, please contact: qiaosiyijoyies@gmail.com

Acknowledgement

This project is based on diffusers, LDM, OSEDiff and PiSA-SR. Thanks for the awesome work.

statistics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

⏰ Update

TODO

⚙ Dependencies and Installation

🏂 Quick Inference

Real-world Image Super-resolution

Step 1: Download the pretrained models

Step 2: Prepare testing data and run testing command

🚄 Training Phase

Train VAED4 on the OpenImage dataset and LSDIR dataset.

Step1: Prepare training data

Step2: Train VAED4

Train TVTSR on the Real-ISR datasets

Step1: Prepare training data

Step2: Train Real-ISR Model

🔗 Citations

©️ License

📧 Contact

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
TVT		TVT
ckp		ckp
ram		ram
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Joyies/TVT

Folders and files

Latest commit

History

Repository files navigation

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

⏰ Update

TODO

⚙ Dependencies and Installation

🏂 Quick Inference

Real-world Image Super-resolution

Step 1: Download the pretrained models

Step 2: Prepare testing data and run testing command

🚄 Training Phase

Train VAED4 on the OpenImage dataset and LSDIR dataset.

Step1: Prepare training data

Step2: Train VAED4

Train TVTSR on the Real-ISR datasets

Step1: Prepare training data

Step2: Train Real-ISR Model

🔗 Citations

©️ License

📧 Contact

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages