TY - GEN
T1 - Speech emotion recognition using GhostVLAD and sentiment metric learning
AU - Mocanu, Bogdan
AU - Tapu, Ruxandra
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/9/13
Y1 - 2021/9/13
N2 - In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.
AB - In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.
KW - Convolutional neural networks
KW - Emotional metric learning
KW - GhostVLAD aggregation
KW - Speech emotion recognition
U2 - 10.1109/ISPA52656.2021.9552068
DO - 10.1109/ISPA52656.2021.9552068
M3 - Conference contribution
AN - SCOPUS:85116991153
T3 - International Symposium on Image and Signal Processing and Analysis, ISPA
SP - 126
EP - 130
BT - ISPA 2021 - 12th International Symposium on Image and Signal Processing and Analysis
A2 - Petkovic, Tomislav
A2 - Petrinovic, Davor
A2 - Loncaric, Sven
PB - IEEE Computer Society
T2 - 12th International Symposium on Image and Signal Processing and Analysis, ISPA 2021
Y2 - 13 September 2021 through 15 September 2021
ER -