DEFOM-Stereo [CVPR 2025]

The Official Pytorch Implementation for

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Authors: Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang and Rui Huang

Abstract

Stereo matching is a key technique for metric depth estimation in computer vision and robotics. Real-world challenges like occlusion and non-texture hinder accurate disparity estimation from binocular matching cues. Recently, monocular relative depth estimation has shown remarkable generalization using vision foundation models. Thus, to facilitate robust stereo matching with monocular depth cues, we incorporate a robust monocular relative depth model into the recurrent stereo-matching framework, building a new framework for depth foundation model-based stereo-matching, DEFOM-Stereo. In the feature extraction stage, we construct the combined context and matching feature encoder by integrating features from conventional CNNs and DEFOM. In the update stage, we use the depth predicted by DEFOM to initialize the recurrent disparity and introduce a scale update module to refine the disparity at the correct scale. DEFOM-Stereo is verified to have much stronger zero-shot generalization compared with SOTA methods. Moreover, DEFOM-Stereo achieves top performance on the KITTI 2012, KITTI 2015, Middlebury, and ETH3D benchmarks, ranking $1^{st}$ on many metrics. In the joint evaluation under the robust vision challenge, our model simultaneously outperforms previous models on the individual benchmarks, further demonstrating its outstanding capabilities.

Pipeline

We propose a novel recurrent stereo-matching framework incorporating monocular depth cues from a depth foundation model to improve robustness.
We develop a simple technique that utilizes pre-trained DEFOM features to construct stronger combined feature and context encoders.
We invent a recurrent scale update module empowered with the scale lookup, serving to recover accurate pixel-wise scales for the coarse DEFOM depth.

Zero-Shot Perfomance

Benchmark Performance

Robust Vision Challange

Preparation

Installation

Create the environment

conda env create -f environment.yaml
conda activate defomstereo
pip install -r requirements.txt

Datasets

The project requires the follow datasets:

KITTI-2012	KITTI-2015	Middlebury	ETH3D	InStereo2K
Virtual KITTI 2	SceneFlow	TartanAir	CREStereo Dataset	FallingThings
Sintel Stereo	HR-VS	3D Ken Burns	IRS Dataset	Booster Dataset

The datasets are organized as follows,

.
└── datasets
    ├── 3dkenburns
    │   ├── asdf-flying
    │   ├── asdf-flying-depth
    │   └── ...
    ├── Booster_Dataset
    │   ├── test
    │   └── train
    ├── CreStereo
    │   ├── hole
    │   ├── reflective
    │   ├── shapenet
    │   └── tree
    ├── ETH3D
    │   ├── two_view_testing
    │   ├── two_view_training
    │   └── two_view_training_gt
    ├── FallingThings
    │   └── fat
    ├── HRVS
    │   └── carla-highres
    ├── InStereo2K
    │   ├── part1
    │   ├── part2
    │   ├── part3
    │   ├── part4
    │   ├── part5
    │   └── test
    ├── IRSDataset 
    │   ├── Home
    │   ├── Office
    │   ├── Restaurant
    │   └── Store
    ├── KITTI12
    │   ├── testing
    │   └── training
    ├── KITTI15
    │   ├── testing
    │   └── training
    ├── Middlebury
    │   ├── 2005
    │   ├── 2006
    │   ├── 2014
    │   ├── 2021
    │   └── MiddEval3
    ├── SceneFlow
    │   ├── Driving
    │   ├── FlyingThings3D
    │   └── Monkaa
    ├── SintelStereo 
    │   └── training
    ├── TartanAir
    │   ├── abandonedfactory
    │   ├── abandonedfactory_night
    │   └── ...
    └── VKITTI2 
        ├── Scene01
        ├── Scene02
        ├── Scene06
        ├── Scene18
        └── Scene20

Evaluation

Download the pre-trained models

bash scripts/download_models.sh

The pretrained models are available on Google Drive and can be downloaded mamanually.

Perfom Evaluation

bash scripts/evaluate.sh

Make Benchmark Submission

bash scripts/make_submission.sh

Training

Download DaV2 models

bash scripts/download_dav2.sh

Train on SceneFlow

bash scripts/train_sceneflow_vits.sh
bash scripts/train_sceneflow_vitl.sh

Finetune for Benchmarks

bash scripts/train_kitti.sh
bash scripts/train_middlebury.sh
bash scripts/train_eth3d.sh
bash scripts/train_rvc.sh

Domo on real samples

python demo.py --restore_ckpt checkpoints/defomstereo_vitl_sceneflow.pth

Acknowledgements

The project is based on RAFT-Stereo and Depth Anything V2 and we sincerely acknowledge their authors for opensourcing the excellent work. Besides, we would like to thank the CVPR reviewers and AC for their valuable feedback and recognition of our work.

Citation

Please cite our paper if you find our work useful in your research.

@inproceedings{jiang2025defom,
  title={DEFOM-Stereo: Depth Foundation Model Based Stereo Matching},
  author={Jiang, Hualie and Lou, Zhiqiang and Ding, Laiyan and Xu, Rui and Tan, Minglang and Jiang, Wenjie and Huang, Rui},
  booktitle={IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
core		core
demo		demo
depth_anything_v2		depth_anything_v2
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
environment.yaml		environment.yaml
evaluate_stereo.py		evaluate_stereo.py
make_submission.py		make_submission.py
requirements.txt		requirements.txt
train_stereo.py		train_stereo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DEFOM-Stereo [CVPR 2025]

Abstract

Pipeline

Zero-Shot Perfomance

Benchmark Performance

Robust Vision Challange

Preparation

Installation

Datasets

Evaluation

Download the pre-trained models

Perfom Evaluation

Make Benchmark Submission

Training

Download DaV2 models

Train on SceneFlow

Finetune for Benchmarks

Domo on real samples

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Insta360-Research-Team/DEFOM-Stereo

Folders and files

Latest commit

History

Repository files navigation

DEFOM-Stereo [CVPR 2025]

Abstract

Pipeline

Zero-Shot Perfomance

Benchmark Performance

Robust Vision Challange

Preparation

Installation

Datasets

Evaluation

Download the pre-trained models

Perfom Evaluation

Make Benchmark Submission

Training

Download DaV2 models

Train on SceneFlow

Finetune for Benchmarks

Domo on real samples

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages