Deep Reinforcement Learning for Audio-Visual Gaze Control

  • Stéphane Lathuilière
  • , Benoit Massé
  • , Pablo Mesejo
  • , Radu Horaud

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We address the problem of audio-visual gaze control in the specific context of human-robot interaction, namely how controlled robot motions are combined with visual and acoustic observations in order to direct the robot head towards targets of interest. The paper has the following contributions: (i) a novel audio-visual fusion framework that is well suited for controlling the gaze of a robotic head; (ii) a reinforcement learning (RL) formulation for the gaze control problem, using a reward function based on the available temporal sequence of camera and microphone observations; and (iii) several deep architectures that allow to experiment with early and late fusion of audio and visual data. We introduce a simulated environment that enables us to learn the proposed deep RL model without the need of spending hours of tedious interaction. By thoroughly experimenting on a publicly available dataset and on a real robot, we provide empirical evidence that our method achieves state-of-the-art performance.

Original languageEnglish
Title of host publication2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1555-1562
Number of pages8
ISBN (Electronic)9781538680940
DOIs
Publication statusPublished - 27 Dec 2018
Externally publishedYes
Event2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 - Madrid, Spain
Duration: 1 Oct 20185 Oct 2018

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
Country/TerritorySpain
CityMadrid
Period1/10/185/10/18

Fingerprint

Dive into the research topics of 'Deep Reinforcement Learning for Audio-Visual Gaze Control'. Together they form a unique fingerprint.

Cite this