Speech emotion recognition using GhostVLAD and sentiment metric learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.

Original languageEnglish
Title of host publicationISPA 2021 - 12th International Symposium on Image and Signal Processing and Analysis
EditorsTomislav Petkovic, Davor Petrinovic, Sven Loncaric
PublisherIEEE Computer Society
Pages126-130
Number of pages5
ISBN (Electronic)9781665426398
DOIs
Publication statusPublished - 13 Sept 2021
Event12th International Symposium on Image and Signal Processing and Analysis, ISPA 2021 - Virtual, Zagreb, Croatia
Duration: 13 Sept 202115 Sept 2021

Publication series

NameInternational Symposium on Image and Signal Processing and Analysis, ISPA
Volume2021-September
ISSN (Print)1845-5921
ISSN (Electronic)1849-2266

Conference

Conference12th International Symposium on Image and Signal Processing and Analysis, ISPA 2021
Country/TerritoryCroatia
CityVirtual, Zagreb
Period13/09/2115/09/21

Keywords

  • Convolutional neural networks
  • Emotional metric learning
  • GhostVLAD aggregation
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'Speech emotion recognition using GhostVLAD and sentiment metric learning'. Together they form a unique fingerprint.

Cite this