Isaac ROS DNN Stereo Depth

NVIDIA-accelerated, deep learned stereo disparity estimation

Webinar Available

Learn how to use this package by watching our on-demand webinar: Using ML Models in ROS 2 to Robustly Estimate Distance to Obstacles

Overview

Deep Neural Network (DNN)–based stereo models have become essential for depth estimation because they overcome many of the fundamental limitations of classical and geometry-based stereo algorithms.

Traditional stereo matching relies on explicitly finding pixel correspondences between left and right images using handcrafted features. While effective in well-textured, ideal conditions, these approaches often fail in “ill-posed” regions such as areas with reflections, specular highlights, texture-less surfaces, repetitive patterns, occlusions, or even minor camera calibration errors. In such cases, classical algorithms may produce incomplete or inaccurate depth maps, or be forced to discard information entirely, especially when context-dependent filtering is not possible.

DNN-based stereo methods learn rich, hierarchical feature representations and context-aware matching costs directly from data. These models leverage semantic understanding and global scene context to infer depth, even in challenging environments where traditional correspondence measures break down. Through training, DNNs can implicitly account for real-world imperfections such as:

calibration errors
exposure differences
hardware noise

Training increases DNN’s ability to recognize and handle difficult regions like reflections or transparent surfaces. This results in more robust, accurate, and dense depth predictions.

These advances are critical for robotics and autonomous systems, enabling applications where both speed and accuracy of depth perception are essential, such as:

precise robotic arm manipulation
reliable obstacle avoidance and navigation
robust target tracking in dynamic or cluttered environments

DNN-based stereo methods consistently outperform classical techniques, making them the preferred choice for modern depth perception tasks.

The superiority of DNN-based stereo methods is clearly demonstrated in the figure above where we compare the output from a classical stereo algorithm, SGM, with DNN-based methods, ESS, and FoundationStereo.

SGM produces a very noisy and error-prone disparity map, while ESS and FoundationStereo produce much smoother and more accurate disparity maps. A closer look reveals that FoundationStereo produces the most accurate map because it is better at handling the plant in the distance and the railings on the left with smoother estimates. Overall, you can see that FoundationStereo is better than ESS, and better than SGM, in terms of accuracy and quality.

DNN‐based stereo systems begin by passing the left and right images through shared Convolutional backbones to extract multi‐scale feature maps that encode both texture and semantic information. These feature maps are then compared across potential disparities by constructing a learnable cost volume, which effectively represents the matching likelihood of each pixel at different disparities. Successive 3D Convolutional (or 2D convolution + aggregation) stages then regularize and refine this cost volume, integrating strong local cues—like edges and textures—and global scene context—such as object shapes and layout priors—to resolve ambiguities. Finally, a soft‐argmax or classification layer converts the refined cost volume into a dense disparity map, often followed by lightweight refinement modules that enforce sub-pixel accuracy and respect learned priors (for example, smoothness within objects, sharp transitions at boundaries), yielding a coherent estimate that gracefully handles challenging scenarios where classical algorithms falter.

Isaac ROS NITROS Acceleration

This package is powered by NVIDIA Isaac Transport for ROS (NITROS), which leverages type adaptation and negotiation to optimize message formats and dramatically accelerate communication between participating nodes.

Performance

Sample Graph	Input Size	AGX Thor	x86_64 w/ RTX 5090
DNN Stereo Disparity Node Full	576p	178 fps 22 ms @ 30Hz	350 fps 5.6 ms @ 30Hz
DNN Stereo Disparity Node Light	288p	350 fps 9.4 ms @ 30Hz	350 fps 5.0 ms @ 30Hz
DNN Stereo Disparity Graph Full	576p	73.6 fps 29 ms @ 30Hz	348 fps 8.5 ms @ 30Hz
DNN Stereo Disparity Graph Light	288p	219 fps 17 ms @ 30Hz	350 fps 7.3 ms @ 30Hz

Documentation

Please visit the Isaac ROS Documentation to learn how to use this repository.

Packages

Latest

Update 2025-10-24: Added FoundationStereo package

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
gxf_isaac_ess		gxf_isaac_ess
gxf_isaac_video_buffer_utils		gxf_isaac_video_buffer_utils
isaac_ros_ess		isaac_ros_ess
isaac_ros_ess_models_install		isaac_ros_ess_models_install
isaac_ros_foundationstereo		isaac_ros_foundationstereo
isaac_ros_foundationstereo_models_install		isaac_ros_foundationstereo_models_install
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Isaac ROS DNN Stereo Depth

Webinar Available

Overview

Isaac ROS NITROS Acceleration

Performance

Documentation

Packages

Latest

About

Uh oh!

Releases 13

Contributors 3

Uh oh!

Languages

License

NVIDIA-ISAAC-ROS/isaac_ros_dnn_stereo_depth

Folders and files

Latest commit

History

Repository files navigation

Isaac ROS DNN Stereo Depth

Webinar Available

Overview

Isaac ROS NITROS Acceleration

Performance

Documentation

Packages

Latest

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 13

Contributors 3

Uh oh!

Languages