Project Page | Paper | ArXiv | Video
This is the official repository of our paper accepted at IROS 2025:
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
Authors: Jannik Endres, Oliver Hahn, Charles Corbière, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi
- 09/08/2025: Our paper has been accepted at IROS 2025! π Check out the updated paper on arXiv and watch the video.
- 11/04/2025: Our code is now publicly available in this repository.
- 30/03/2025: Our paper is available on arXiv.
A shared depth foundation model (purple) is utilized to extract representations from a top and bottom image. Subsequently, an omnidirectional stereo matching head (pink) predicts disparity, utilizing the image features as follows: The intermediate representations and relative depth maps of both images are adapted to be processed as multi-scale feature maps by the iterative matching head. This head predicts a disparity map using vertical warping for cost volume construction.
The training consists of two stages. In training stage A (blue), we adapt the stereo matching head to the omnidirectional data and the foundation model features (foundation model frozen) using a conventional stereo matching loss L_A. In stage B (orange), we fine-tune the foundation model decoder and the stereo matching head, utilizing a scale-invariant logarithmic loss L_B. Frozen and trainable modules are denoted with a snowflake and fire symbol, respectively.
We use Hydra for configuration management and Weights & Biases for comprehensive experiment tracking and visualization.
conda create -n dfi-omnistereo python=3.11
conda activate dfi-omnistereo
git clone git@github.com:vita-epfl/DFI-OmniStereo.git
cd DFI-OmniStereo
pip install -r requirements.txt
Download the Helvipad dataset and store it at a location of your choice, e.g: ./data/helvipad
.
IGEV-Stereo (SceneFlow weights): Create the directory and download the pretrained SceneFlow weights from the IGEV-Stereo Google Drive, as provided by IGEV-Stereo:
mkdir -p ./src/models/dfi_omnistereo/pretrained_models/igev_stereo
Place the downloaded file into the directory created above.
Depth Anything V2 Base: Download the Depth-Anything-V2-Base model provided by Depth Anything V2:
mkdir -p ./src/models/dfi_omnistereo/pretrained_models/depth_anything && \
wget -O ./src/models/dfi_omnistereo/pretrained_models/depth_anything/depth_anything_v2_vitb.pth \
"https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true"
DFI-OmniStereo main checkpoint: Download our pretrained model checkpoint:
mkdir -p ./src/models/dfi_omnistereo/pretrained_models/dfi_omnistereo && \
wget -O ./src/models/dfi_omnistereo/pretrained_models/dfi_omnistereo/dfi_omnistereo_helvipad.pth "https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/4557/dfi_omnistereo_helvipad.pth"
To train the model from the pretrained depth foundation model and omnidirectional weights (Stage A), run the following command:
cd src
python train.py \
--debug=false \
--exp_name=Stage-A \
--dataset_root=./data/helvipad/
All other parameters are set to their default values for Stage A training.
To continue training with Stage B, restore the best checkpoint from Stage A. For example, if your Stage A checkpoint is saved at models/dfi_omnistereo/pretrained_models/best-ckpt-stage-a.pth
, run:
python train.py \
--debug=false \
--exp_name=Stage-B \
--dataset_root=./data/helvipad/ \
--restore_ckpt=./models/dfi_omnistereo/pretrained_models/best-ckpt-stage-a.pth \
--lr=0.00002 \
--epochs=12 \
--train_batch_size=1 \
--train_depth_anything=true \
--use_silog_loss=true
To evaluate our model using the main checkpoint and compute all metrics including Left-Right Consistency Error (LRCE), use:
cd src
python evaluate.py \
--debug=false \
--exp_name=Evaluation \
--dataset_root=./data/helvipad/ \
--restore_ckpt=./models/dfi_omnistereo/pretrained_models/dfi_omnistereo/dfi_omnistereo_helvipad.pth \
--calc_lrce=true
Note: Setting --calc_lrce=true
enables LRCE evaluation, which increases computation time.
To generate inference results on selected samples from the Helvipad dataset, run the following command:
cd src
python infer.py \
--infer_name=helvipad_paper_results \
--dataset_root=./data/helvipad/ \
--restore_ckpt=./models/dfi_omnistereo/pretrained_models/dfi_omnistereo/dfi_omnistereo_helvipad.pth \
--images test-20240120_REC_06_IN-0042 test-20240124_REC_03_OUT-0676 test-20240124_REC_08_NOUT-0717
This command will process the following frames (all of which are part of the test
set):
0042
from the scene20240120_REC_06_IN
0676
from the scene20240124_REC_03_OUT
0717
from the scene20240124_REC_08_NOUT
The results as well as the top and bottom images will be saved to: src/models/dfi_omnistereo/inference_results/helvipad_paper_results
.
To evaluate our model on real-world examples from the 360SD-Net dataset:
- Download the real-world top and bottom images from the official repo.
- Place the data in a directory of your choice, e.g.,
./data/360sd
. - Run the following command to perform inference:
cd src
python infer.py \
--infer_name=360SD_paper_results \
--dataset_root=./data/360sd/ \
--restore_ckpt=./models/dfi_omnistereo/pretrained_models/dfi_omnistereo/dfi_omnistereo_helvipad.pth \
--dataset=360SD \
--min_disp_deg=0.0048 \
--max_disp_deg=178 \
--max_disp=512 \
--images hall room stairs
This will run inference on the following scenes:
hall
room
stairs
The results will be saved in: src/models/dfi_omnistereo/inference_results/360SD_paper_results
.
Below is an overview of the repository structure, with descriptions for key files and directories:
βββ docs/ # Documentation assets (e.g., images).
β βββ architecture.png # Diagram of the method's architecture.
β βββ overview.png # Overview image of the project.
βββ src/ # Source code for the project.
β βββ conf/ # Configuration files.
β β βββ model/ # Model implementations.
β β β βββ dfi_omnistereo.yaml # Model-specific configuration.
β β βββ config.yaml # General configuration file.
β βββ general/ # General utilities and helper scripts.
β β βββ augmentor.py # Data augmentation utilities.
β β βββ conversion.py # Data conversion utilities.
β β βββ metrics.py # Evaluation metrics.
β β βββ stereo_datasets.py # Dataset handling for stereo data.
β β βββ utils.py # Miscellaneous utility functions.
β βββ models/ # Model implementations.
β β βββ dfi_omnistereo/ # DFI-OmniStereo-specific models.
β β β βββ depth_foundation_model/ # Depth foundation model part.
β β β β βββ dpt_dinov2/ # DPT (see Depth Anything).
β β β β βββ torchhub/ # ViT (see DINOv2).
β β β β βββ depth_anything.py # Depth Anything model.
β β β βββ omnidirectional_stereo_matching/ # Omnidirectional stereo matching part.
β β β β βββ depth_anything_extractor.py # Depth feature extraction.
β β β β βββ extractor.py # General feature extraction logic.
β β β β βββ geometry.py # Geometry-related computations.
β β β β βββ igev_stereo.py # IGEV-Stereo iterative matching head.
β β β β βββ submodule.py # Submodule utilities.
β β β β βββ update.py # Iterative update logic.
β β | βββ dfi_model.py # Main DFI-OmniStereo model definition.
β β βββ base_model.py # Base model class.
β βββ evaluate.py # Script for model evaluation.
β βββ infer.py # Script for inference.
β βββ train.py # Script for training the model.
βββ .gitignore # Specifies files to ignore in Git.
βββ LICENSE # License information for the repository.
βββ README.md # Project README file (this file).
βββ requirements.txt # Python dependencies for the project.
We thank the authors of Depth Anything, DINOv2, IGEV-Stereo, and RAFT-Stereo for releasing their code.
@inproceedings{endres2025dfiomnistereo,
author = {Endres, Jannik and Hahn, Oliver and Corbière, Charles and Schaub-Meyer, Simone and Roth, Stefan and Alahi, Alexandre},
title = {Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model},
booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2025},
organization = {IEEE}
}