TY - GEN
T1 - Learning visual voice activity detection with an automatically annotated dataset
AU - Guy, Sylvain
AU - Lathuilière, Stéphane
AU - Mesejo, Pablo
AU - Horaud, Radu
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing V-VAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets in-the-wild - WildVVAD - based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset.
AB - Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing V-VAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets in-the-wild - WildVVAD - based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset.
UR - https://www.scopus.com/pages/publications/85110413723
U2 - 10.1109/ICPR48806.2021.9412884
DO - 10.1109/ICPR48806.2021.9412884
M3 - Conference contribution
AN - SCOPUS:85110413723
T3 - Proceedings - International Conference on Pattern Recognition
SP - 4851
EP - 4856
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -