TY - GEN
T1 - Improving Multimodal fusion via Mutual Dependency Maximisation
AU - Colombo, Pierre
AU - Chapuis, Emile
AU - Labeau, Matthieu
AU - Clavel, Chloe
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as L1 or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to 4.3 on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: CMU-MOSI and CMU-MOSEI. Our method not only achieves a new SOTA on both datasets but also produces representations that are more robust to modality drops. Finally, a by-product of our methods includes a statistical network which can be used to interpret the high dimensional representations learnt by the model.
AB - Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as L1 or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to 4.3 on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: CMU-MOSI and CMU-MOSEI. Our method not only achieves a new SOTA on both datasets but also produces representations that are more robust to modality drops. Finally, a by-product of our methods includes a statistical network which can be used to interpret the high dimensional representations learnt by the model.
UR - https://www.scopus.com/pages/publications/85127435313
U2 - 10.18653/v1/2021.emnlp-main.21
DO - 10.18653/v1/2021.emnlp-main.21
M3 - Conference contribution
AN - SCOPUS:85127435313
T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 231
EP - 245
BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Y2 - 7 November 2021 through 11 November 2021
ER -