📖TL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.
- [2025-07-11] 🎉 We have officially released the model weights for public use!
You can now download the pretrained weights via the google drive.
- Release the demo inference files
- Release the inference pipeline
- Release the model weights
- Release the training files
conda create -n any2bokeh python=3.10 -y
conda activate any2bokeh
# The default CUDA version is 12.4, please modify it according to your configuration.
# Install pytorch.
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
# Clone repo
git clone https://github.com/vivoCameraResearch/any-to-bokeh.git
cd any2bokeh
pip install -r requirements.txt
We obtained 8 demos from DAVIS dataset
- Download the pre-trained weights in google drive in
./checkpoints
folder. - Run the demo script
python test/inference_demo.py
. The results will be saved in the./output
folder.
Before bokeh rendering, two data preprocessing steps are required.
We recommend using SAM2 to get the mask of the focusing target.
First, split the video into frames, place it in a folder, and use the utils/script_mp4.py.
python utils/split_mp4.py input.mp4
Then, install the video depth anything to get depth information for each frame by our script.
python utils/pre_process.py \
--img_folder path/to/images \
--mask_folder path/to/masks \ # Path to the mask obtained via sam2
--disp_dir output/directory \
The folder aif_folder
that stores the video frames, the corresponding folder disp_folder
that has been preprocessed, and the value k
representing the intensity of bokeh into a CSV file in the following format (like demo.csv):
aif_folder | disp_folder | k |
---|---|---|
demo_dataset/videos/xxx | demo_dataset/disp/xxx | 16 |
Then, run the script
python test/inference_demo.py --val_csv_path csv_file/demo.csv
First, define the blur strength k
for each frame. Specifically, the filename of the depth file for each frame needs to be modified. We provide a simple modification script for this purpose.
Next, the CSV configuration for case1 should be updated to the following template(e.g., change_k_demo.csv):
aif_folder | disp_folder | k |
---|---|---|
demo_dataset/videos/xxx | demo_dataset/disp_change_k/xxx | change |
Then, run the script
python test/inference_demo.py --val_csv_path csv_file/demo_change_k.csv
We use the number identified by _zf_
to represent the disparity value of the focus plane. You can customize this value for each frame to adjust the focus plane. We provide a simple modification script for this purpose.
Next, the CSV configuration is the same as in case1 (e.g., change_f_demo.csv):
aif_folder | disp_folder | k |
---|---|---|
demo_dataset/videos/xxx | demo_dataset/disp_change_f/xxx | 16 |
Then, run the script
python test/inference_demo.py --val_csv_path csv_file/demo_change_f.csv
This codebase builds on SVD_Xtend. Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:
- SAM2 (https://github.com/facebookresearch/sam2).
- Video-Depth-Anything (https://github.com/DepthAnything/Video-Depth-Anything).
@article{yang2025any,
title={Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion},
author={Yang, Yang and Zheng, Siming and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
journal={arXiv preprint arXiv:2505.21593},
year={2025}
}
If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.