Depth Estimation
Predicting depth values (distances) for every pixel of a 2D image to generate a 3D depth map.
Depth estimation predicts depth values for every pixel – enabling 3D understanding from 2D images for AR, robotics, and autonomous driving.
Explanation
Monocular depth estimation uses a single image (no stereo). Models like Depth Anything (2024) and MiDaS provide relative or metric depth.
Marketing Relevance
Depth estimation enables 3D reconstruction, AR effects, autonomous driving, and robotics from ordinary cameras.
Example
A smartphone uses depth estimation for portrait mode bokeh without dedicated depth sensor hardware.
Common Pitfalls
Monocular depth is inherently ambiguous (scale unknown). Weaknesses with reflective and transparent surfaces.
Origin & History
Saxena et al. (2006) showed first ML-based monocular depth estimation. MiDaS (Intel, 2020) brought robust cross-dataset generalization. Depth Anything (2024, TikTok/ByteDance) achieved state-of-the-art with foundation model approach.
Comparisons & Differences
Depth Estimation vs. Stereo Vision
Stereo vision uses two cameras for geometric depth. Monocular depth estimation uses only one image and learns depth from data.
Depth Estimation vs. LiDAR
LiDAR measures depth actively with laser (exact). Depth estimation predicts passively from images (cheaper, less precise).