Official implementation of AnchorSync(ACMMM 2025).
This repo includes the reproduced version of AnchorSync: Global Consistency Optimization for Long Video Editing.
Suppose the AnchorSync codebase path is ${AnchorSync_HOME}
. Then, follow the subsequent procedures.
cd ${AnchorSync_HOME}
conda create -n anchorsync python=3.10
conda activate anchorsync
python3 -m pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118
python3 -m pip install -r requirements.txt --no-deps
python3 -m pip install xformers==0.0.25 --index-url https://download.pytorch.org/whl/cu118
Download videos from Pandas-70m Dataset in ${AnchorSync_HOME}/data.
the ${AnchorSync_HOME}/data/Panda-70M folder should be organized as follows:
└── data
└── Panda-70M
├── train
├── test
├── video_files.json
video_files.json record the video storage path information, which you can generate using the provided script:
python get_pandas.py
Download stable-diffusion-v1-5, Canny controlnet for sd 1.5 and stable-video-diffusion-img2vid-xt. Change corresponding checkpoint path.
Or you can download them automatically at runtime (default).
First, train joint diffusion for first step:
bash train_models/train_scripts/train_joint_frame_lora.sh
Second, train multimodal controlnet for SVD:
bash train_models/train_scripts/train_controlnet_canny+flow.sh
If you do not train, you can download joint frame lora in {joint_lora_path}, download multimodal controlnet in {multimodal_controlnet_path}.
For example, the ${AnchorSync_HOME}/data/Panda-70M folder should be organized as follows:
└── output_dir
├── joint_frame_lora
├── multimodal-controlnet
Put your videos in data/, named "{case_name}.mp4". You can use it like below:
- DDIM Inversion of first process (jointly edit anchor frames)
python run_models/run_inference_joint_frame_video_fusion_guidance_inversion.py --case_name "mountain-new" --invert_prompt "Vast Mountain Landscape under Clear Blue Sky" --joint_lora_dir "output_dir/joint_frame_lora"
- Forward editing of first process (jointly edit anchor frames)
python run_models/run_inference_joint_frame_video_fusion_guidance_forward.py --case_name "mountain-new" --invert_prompt "Vast Mountain Landscape under Clear Blue Sky" --prompt "Chinese Ink Wash Painting of Mountain Landscape under Clear Sky" --joint_lora_dir "output_dir/joint_frame_lora"
- Second process (Multimodal-Guided Interpolation)
python run_models/run_inference_trans_controlnet_canny_flow_video_fusion_guidance_pnp.py --case_name "mountain-new" --prompt "Chinese Ink Wash Painting of Mountain Landscape under Clear Sky" --multimodal_controlnet_path "output_dir/multimodal-controlnet"
We recommend you try editing longer videos like:
python run_models/run_inference_joint_frame_video_fusion_guidance_inversion.py --case_name "forest-2" --invert_prompt "A forest path in morning sunlight with green trees and long shadows" --joint_lora_dir "output_dir/joint_frame_lora"
python run_models/run_inference_joint_frame_video_fusion_guidance_forward.py --case_name "forest-2" --invert_prompt "A forest path in morning sunlight with green trees and long shadows" --prompt "A forest path covered in snow during a winter sunrise" --joint_lora_dir "output_dir/joint_frame_lora"
python run_models/run_inference_trans_controlnet_canny_flow_video_fusion_guidance_pnp.py --case_name "forest-2" --prompt "A forest path covered in snow during a winter sunrise" --multimodal_controlnet_path "output_dir/multimodal-controlnet"
This repository refers to multiple great open-sourced code bases. Thanks for their great contribution to the community.
If this work is helpful for your research, please consider citing the following BibTeX entry.