HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation (CVPR 2025)
Please follow the instructions below to set up the environment:
# Create a new conda environment
conda create -n himor python=3.10
conda activate himor
# Install dependencies
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
pip install -r requirements.txt
pip install git+https://github.com/nerfstudio-project/gsplat.git
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
Download the preprocessed iPhone dataset from here and place it under ./data/iPhone/
. Pretrained checkpoints are available here.
We use the dataset provided by Gaussian Marbles, with foreground masks recomputed using the preprocessing scripts from Shape of Motion. Download the preprocessed dataset from here and place it under ./data/nvidia/
.
To train on a custom dataset, please follow the instruction provided by Shape of Motion for preprocessing. Note that in our case, the data should be formatted following the iPhone dataset structure.
To visualize results using an interactive viewer, first download the pretrained checkpoints, then run the following command:
python run_rendering.py --ckpt-path <path-to-ckpt>
For better reconstruction especially in background:
python run_training.py --work-dir ./outputs/paper-windmill --port 8888 data:iphone --data.data-dir ./data/iphone/paper-windmill --data.depth_type depth_anything_colmap --data.camera_type refined
In the paper, we report results using the original camera poses:
# First, align monocular depth with LiDAR depth.
python preproc/align_monodepth_with_lidar.py --lidar_depth_dir ./data/iphone/paper-windmill/depth/1x/ --input_monodepth_dir ./data/iphone/paper-windmill/flow3d_preprocessed/depth_anything/1x --output_monodepth_dir ./data/iphone/paper-windmill/flow3d_preprocessed/aligned_depth_anything_lidar/1x --matching_pattern "0*"
# Then, run training.
python run_training.py --work-dir ./outputs/paper-windmill --port 8888 data:iphone --data.data-dir ./data/iphone/paper-windmill --data.depth_type depth_anything_lidar --data.camera_type original
Train with the following command:
python run_training.py --work-dir ./outputs/Balloon1 --num_fg 20000 --num_bg 40000 --num_epochs 800 --port 8888 data:nvidia --data.data-dir ./data/nvidia/Balloon1 --data.depth_type lidar --data.camera_type original
Ensure that the checkpoint file outputs/<dataset-name>/checkpoints/last.ckpt
is available. You can either obtain this by training the model or download the provided checkpoints.
Use the checkpoint to render images:
python run_evaluation.py --work-dir outputs/paper-windmill/ --ckpt-path outputs/paper-windmill/checkpoints/last.ckpt data:iphone --data.data-dir ./data/iphone/paper-windmill
Evaluate the rendered images to compute quantitative metrics:
# For the iPhone dataset
PYTHONPATH="." python scripts/evaluate_iphone.py --data_dir ./data/iphone/paper-windmill --result_dir ./outputs/paper-windmill/
# For the Nvidia dataset
PYTHONPATH="." python scripts/evaluate_nvidia.py --data_dir ./data/nvidia/Balloon1/ --result_dir ./outputs/Balloon1/
@inproceedings{liang2025himor,
author = {Liang, Yiming and Xu, Tianhan and Kikuchi, Yuta},
title = {{H}i{M}o{R}: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation},
booktitle = {CVPR},
year = {2025},
}
Our implementation builds on Shape of Motion. We thank the authors for open-sourcing their code.