Passer à la navigation principale Passer à la recherche Passer au contenu principal

Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition

  • Felix Weninger
  • , Jean Louis Durrieu
  • , Florian Eyben
  • , Gaël Richard
  • , Björn Schuller
  • Technical University of Munich
  • ENAC-IIC-GEL
  • CNRS LTCI

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm specifically tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classifiers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to 10% absolute. Furthermore, using this technique we achieve 91.6% accuracy in determining the gender of the predominant vocalist on song level, which is 4% absolute above our previous best result.

langue originaleAnglais
titre2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages2196-2199
Nombre de pages4
Les DOIs
étatPublié - 18 août 2011
Modification externeOui
Evénement36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, République tchcque
Durée: 22 mai 201127 mai 2011

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (imprimé)1520-6149

Une conférence

Une conférence36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Pays/TerritoireRépublique tchcque
La villePrague
période22/05/1127/05/11

Empreinte digitale

Examiner les sujets de recherche de « Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation