TY - GEN
T1 - PANO-ECHO
T2 - 2nd IEEE Conference on Artificial Intelligence, CAI 2024
AU - Liu, Xiaohu
AU - Brunetto, Amandine
AU - Hornauer, Sascha
AU - Moutarde, Fabien
AU - Lu, Jialiang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Panoramic depth estimation gains importance with more 360° images being widely available. However, traditional mono-to-depth approaches, optimized for a limited field of view, show subpar performance when naively adapted. Methods tailored to process panoramic input improve predictions but can not overcome ambiguous visual information and scale-uncertainty inherent to the task.In this paper we show the benefits of leveraging sound for improved panoramic depth estimation. Specifically, we harness audible echoes from emitted chirps as they contain rich geometric and material cues about the surrounding environment. We show that these auditory cues can enhance a state-of-the-art panoramic depth prediction framework. By integrating sound information, we improve this vision-only baseline by ≈12%.Our approach requires minimal modifications to the underlying architecture, making it easily applicable to other baseline models. We validate its efficacy on the Matterport3D and Replica datasets, demonstrating remarkable improvements in depth estimation accuracy. Our code is available here: https: //github.com/peter12398/PANO-ECHO
AB - Panoramic depth estimation gains importance with more 360° images being widely available. However, traditional mono-to-depth approaches, optimized for a limited field of view, show subpar performance when naively adapted. Methods tailored to process panoramic input improve predictions but can not overcome ambiguous visual information and scale-uncertainty inherent to the task.In this paper we show the benefits of leveraging sound for improved panoramic depth estimation. Specifically, we harness audible echoes from emitted chirps as they contain rich geometric and material cues about the surrounding environment. We show that these auditory cues can enhance a state-of-the-art panoramic depth prediction framework. By integrating sound information, we improve this vision-only baseline by ≈12%.Our approach requires minimal modifications to the underlying architecture, making it easily applicable to other baseline models. We validate its efficacy on the Matterport3D and Replica datasets, demonstrating remarkable improvements in depth estimation accuracy. Our code is available here: https: //github.com/peter12398/PANO-ECHO
KW - Audio-Visual learning
KW - multi-modal fusion
KW - panoramic depth estimation
UR - https://www.scopus.com/pages/publications/85201228934
U2 - 10.1109/CAI59869.2024.00193
DO - 10.1109/CAI59869.2024.00193
M3 - Conference contribution
AN - SCOPUS:85201228934
T3 - Proceedings - 2024 IEEE Conference on Artificial Intelligence, CAI 2024
SP - 1063
EP - 1070
BT - Proceedings - 2024 IEEE Conference on Artificial Intelligence, CAI 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 June 2024 through 27 June 2024
ER -