Recovering Dense Metric Depth in Indoor Scenes from Monocular Depth Foundation Models and 2D LiDARs

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, the first foundation models for monocular depth estimation such as Depth Anything have emerged . However, by being trained to make affine-invariant predictions, these methods rely on fine-tuning for making metric depth predictions and therefore perform poorly on zero-shot metric depth estimation. In a real use case, the fine-tuning stage is costly because a dedicated dataset with ground truth depth must be created and used as a training set. Additionally, fine-tuning can compromise the model’s generalization ability. This paper proposes to leverage 2D LiDARs to rescale Depth Anything’s predictions in the context of indoor scenes so as to prevent expensive fine-tuning or harming the model capacity. Our experiments demonstrate similar performance with fine-tuned approaches and enhanced results over zero-shot metric depth estimation methods.

Original languageEnglish
Title of host publicationEuropean Robotics Forum 2025 - Boosting the Synergies between Robotics and AI for a Stronger Europe
EditorsMarco Huber, Alexander Verl, Werner Kraus
PublisherSpringer Nature
Pages236-241
Number of pages6
ISBN (Print)9783031894701
DOIs
Publication statusPublished - 1 Jan 2025
Externally publishedYes
Event16th European Robotics Forum, ERF 2025 - Stuttgart, Germany
Duration: 25 Mar 202527 Mar 2025

Publication series

NameSpringer Proceedings in Advanced Robotics
Volume36 SPAR
ISSN (Print)2511-1256
ISSN (Electronic)2511-1264

Conference

Conference16th European Robotics Forum, ERF 2025
Country/TerritoryGermany
CityStuttgart
Period25/03/2527/03/25

Keywords

  • Zero-shot metric monocular depth estimation

Fingerprint

Dive into the research topics of 'Recovering Dense Metric Depth in Indoor Scenes from Monocular Depth Foundation Models and 2D LiDARs'. Together they form a unique fingerprint.

Cite this