This is the official PyTorch implementation of FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis.
- [2025-5-21]: Add training code and data preperation.
For complete installation instructions, please see INSTALL.md.
Static scene video generation given an image and a camera trajectory:
python video_generate.py --input_image_path ./assets/room.png --output_dir ./results-single-traj
You can pass in traj
argument to specify camera movements, the basic movements is defined in "ops/utils/all_traj.py". The supported camera movements includes ["up","down","left","right","forward","backward","rotate_left","rotate_right"]
.
python video_generate.py --input_image_path ./assets/room.png --output_dir ./results-single-traj --traj backward
You can also generate videos share the same camera trajectories with those in DL3DV and Re10K. Just pass the video path to traj
arguments.
python video_generate.py --input_image_path ./assets/room.png --output_dir ./results-single-traj --traj ./path_to_dl3dv/1.mp4
A flexible-view 360° scene generation given an image.
# You are free to modify the corresponding YAML configuration file by name in `./configs/examples`.
python main_3dgs.py --name room2
First running:
python 3dgs_viewer.py
then visit 127.0.0.1:8000
to freely explore the generated scene in the current directory. The script will scan the ply
file recursively, please doing this after the generation.
-
Download dataset to local dir following DL3DV repo. You may download only part of them, like 1K.
-
Prepare 3DGS from DL3DV dataset, you can first download colmap annotation from DL3DV colmap annotation and then do reconstruction following Gaussian Splatting repo. The final output will listed like:
- output/
- 001dccbc1f78146a9f03861026613d8e73f39f372b545b26118e37a23c740d5f
- point_cloud
- iteration_7000
- point_cloud.ply
- 0003dc82473fd52c53dcbdc2deb4e6e9c3548d6f8c9b03f9ea8d3c7d3ce33546
- point_cloud
- iteration_7000
- point_cloud.ply
- Run following to generate broken video constructed by 3DGS.
# The path here is an example.
python gen_dataset.py --dataset_path ./DL3DV/DL3DV-10K/1K --output_path ./DL3DV/processed --gs_path ./gaussian-splatting/output
- Run following to label the video constructed.
# The path here is an example.
python label_dataset.py --input_path ./DL3DV/processed --output_path ./train_data_v2v
- Change following lines in "./tools/CogVideo/configs/sft_v2v.yaml".
args:
checkpoint_activations: True
experiment_name: lora-disney # your save folder name
mode: finetune
load: "xxx/CogVideoX-5B-I2V-SAT/transformer" # path to transformer original checkpoints
save: "./ckpts_5b" # path to save dir.
train_data: [ "train_data_v2v" ] # Train data path
valid_data: [ "train_data_v2v" ] # Validation data path, can be the same as train_data(no recommended)
- Run training script
cd ./tools/CogVideo/
bash train_video_v2v.py
- A user manual for our camera trajectory, offering support for more flexible trajectory inputs and accommodating a wider variety of trajectory types (such as RealEstate camera input and DL3DV-10K camera input).
- A 3DGS viewer for generated results.
- Training code for video diffusion model.
This work is built on many amazing open source projects, thanks to all the authors!
@misc{chen2025flexworldprogressivelyexpanding3d,
title={FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis},
author={Luxi Chen and Zihan Zhou and Min Zhao and Yikai Wang and Ge Zhang and Wenhao Huang and Hao Sun and Ji-Rong Wen and Chongxuan Li},
year={2025},
eprint={2503.13265},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.13265},
}