Skip to content

pfnet-research/himor

Repository files navigation

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation (CVPR 2025)

video

Installation

Please follow the instructions below to set up the environment:

# Create a new conda environment
conda create -n himor python=3.10
conda activate himor

# Install dependencies
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
pip install -r requirements.txt
pip install git+https://github.com/nerfstudio-project/gsplat.git
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation

Data preparation

iPhone Dataset

Download the preprocessed iPhone dataset from here and place it under ./data/iPhone/. Pretrained checkpoints are available here.

Nvidia Dataset

We use the dataset provided by Gaussian Marbles, with foreground masks recomputed using the preprocessing scripts from Shape of Motion. Download the preprocessed dataset from here and place it under ./data/nvidia/.

Custom Dataset

To train on a custom dataset, please follow the instruction provided by Shape of Motion for preprocessing. Note that in our case, the data should be formatted following the iPhone dataset structure.

Visualization

To visualize results using an interactive viewer, first download the pretrained checkpoints, then run the following command:

python run_rendering.py --ckpt-path <path-to-ckpt>

Training

iPhone Dataset

For better reconstruction especially in background:

python run_training.py --work-dir ./outputs/paper-windmill --port 8888 data:iphone --data.data-dir ./data/iphone/paper-windmill --data.depth_type depth_anything_colmap --data.camera_type refined

In the paper, we report results using the original camera poses:

# First, align monocular depth with LiDAR depth.
python preproc/align_monodepth_with_lidar.py --lidar_depth_dir ./data/iphone/paper-windmill/depth/1x/ --input_monodepth_dir ./data/iphone/paper-windmill/flow3d_preprocessed/depth_anything/1x --output_monodepth_dir ./data/iphone/paper-windmill/flow3d_preprocessed/aligned_depth_anything_lidar/1x --matching_pattern "0*"

# Then, run training. 
python run_training.py --work-dir ./outputs/paper-windmill --port 8888 data:iphone --data.data-dir ./data/iphone/paper-windmill --data.depth_type depth_anything_lidar --data.camera_type original

Nvidia Dataset

Train with the following command:

python run_training.py --work-dir ./outputs/Balloon1 --num_fg 20000 --num_bg 40000 --num_epochs 800 --port 8888 data:nvidia --data.data-dir ./data/nvidia/Balloon1 --data.depth_type lidar --data.camera_type original 

Evaluation

Ensure that the checkpoint file outputs/<dataset-name>/checkpoints/last.ckpt is available. You can either obtain this by training the model or download the provided checkpoints.

Render Images

Use the checkpoint to render images:

python run_evaluation.py --work-dir outputs/paper-windmill/ --ckpt-path outputs/paper-windmill/checkpoints/last.ckpt data:iphone --data.data-dir ./data/iphone/paper-windmill

Compute Metrics

Evaluate the rendered images to compute quantitative metrics:

# For the iPhone dataset
PYTHONPATH="." python scripts/evaluate_iphone.py --data_dir ./data/iphone/paper-windmill --result_dir ./outputs/paper-windmill/ 

# For the Nvidia dataset
PYTHONPATH="." python scripts/evaluate_nvidia.py --data_dir ./data/nvidia/Balloon1/ --result_dir ./outputs/Balloon1/ 

Citation

@inproceedings{liang2025himor,
    author    = {Liang, Yiming and Xu, Tianhan and Kikuchi, Yuta},
    title     = {{H}i{M}o{R}: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation},
    booktitle = {CVPR},
    year      = {2025},
}

Acknowledgement

Our implementation builds on Shape of Motion. We thank the authors for open-sourcing their code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages