Résumé
Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.
| langue originale | Anglais |
|---|---|
| titre | Advances in Visual Computing - 17th International Symposium, ISVC 2022, Proceedings |
| rédacteurs en chef | George Bebis, Bo Li, Angela Yao, Yang Liu, Ye Duan, Manfred Lau, Rajiv Khadka, Ana Crisan, Remco Chang |
| Editeur | Springer Science and Business Media Deutschland GmbH |
| Pages | 295-306 |
| Nombre de pages | 12 |
| ISBN (imprimé) | 9783031207150 |
| Les DOIs | |
| état | Publié - 1 janv. 2022 |
| Evénement | 17th International Symposium on Visual Computing, ISVC 2022 - San Diego, États-Unis Durée: 3 oct. 2022 → 5 oct. 2022 |
Série de publications
| Nom | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 13599 LNCS |
| ISSN (imprimé) | 0302-9743 |
| ISSN (Electronique) | 1611-3349 |
Une conférence
| Une conférence | 17th International Symposium on Visual Computing, ISVC 2022 |
|---|---|
| Pays/Territoire | États-Unis |
| La ville | San Diego |
| période | 3/10/22 → 5/10/22 |
SDG des Nations Unies
Ce résultat contribue à ou aux Objectifs de développement durable suivants
-
SDG 3 Bonne santé et bien-être
Empreinte digitale
Examiner les sujets de recherche de « Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver