MODEST: Monocular Depth Estimation and Segmentation for Transparent Object (ICRA 2025)

This is the official repository of the ICRA 2025 paper "Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion".

Abstract

Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input.

Requirements

We have tested on Ubuntu 20.04 with an NVIDIA GeForce RTX 4090 with Python 3.8 and cuda11.1. The code may work on other systems.

Installation

Setup a virtual environment

python3 -m venv modest
source modest/bin/activate

Install pip dependencies

pip install -r requirements.txt

Download the datasets

The synthetic dataset Syn-TODD for transparent object perception can be downloaded from this repository.

The real-world dataset ClearPose can be downloaded from this repository.

Download the model weight

We provide our pre-trained model weight on Syn-TODD dataset here.

And also weight on the real-world dataset ClearPose here.

Modify the configuration file

Modify the parameters in config/config.json. Specify the dataset type, all paths, batch size, and so on. Configure the wandb part if you want to visualize the running process.

Training

To train the model on Syn-TODD or ClearPose. Simply run:

python train.py

Evaluation

To evaluate the model on the test set, run:

python test.py

Inference

To run the inference, specify the input image path in inference.py and run:

python inference.py

Acknowledgement

Our code is generally built upon DPT. We thank them for their nicely open sourced code and their great contributions to the community.

Citation

If you find MODEST is useful in your research or applications, please consider citing it：

@article{liu2025monocular,
  title={Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion},
  author={Liu, Jiangyuan and Ma, Hongxuan and Guo, Yuxin and Zhao, Yuhao and Zhang, Chi and Sui, Wei and Zou, Wei},
  journal={arXiv preprint arXiv:2502.14616},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
datasets		datasets
images		images
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MODEST: Monocular Depth Estimation and Segmentation for Transparent Object (ICRA 2025)

Abstract

Requirements

Installation

Training

Evaluation

Inference

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

D-Robotics-AI-Lab/MODEST

Folders and files

Latest commit

History

Repository files navigation

MODEST: Monocular Depth Estimation and Segmentation for Transparent Object (ICRA 2025)

Abstract

Requirements

Installation

Training

Evaluation

Inference

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages