Passer à la navigation principale Passer à la recherche Passer au contenu principal

GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL

  • Haocheng Liu
  • , Teysir Baoueb
  • , Mathieu Fontaine
  • , Jonathan Le Roux
  • , Gaël Richard
  • Institut Polytechnique de Paris
  • Mitsubishi Electric Research Laboratories

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Diffusion models are receiving a growing interest for a variety of signal generation tasks such as speech or music synthesis. WaveGrad, for example, is a successful diffusion model that conditionally uses the mel spectrogram to guide a diffusion process for the generation of high-fidelity audio. However, such models face important challenges concerning the noise diffusion process for training and inference, and they have difficulty generating high-quality speech for speakers that were not seen during training. With the aim of minimizing the conditioning error and increasing the efficiency of the noise diffusion process, we propose in this paper a new scheme called GLA-Grad, which consists in introducing a phase recovery algorithm such as the Griffin-Lim algorithm (GLA) at each step of the regular diffusion process. Furthermore, it can be directly applied to an already-trained waveform generation model, without additional training or fine-tuning. We show that our algorithm outperforms state-of-the-art diffusion models for speech generation, especially when generating speech for a previously unseen target speaker.

langue originaleAnglais
titre2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages11611-11615
Nombre de pages5
ISBN (Electronique)9798350344851
Les DOIs
étatPublié - 1 janv. 2024
Evénement2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Corée du Sud
Durée: 14 avr. 202419 avr. 2024

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (imprimé)1520-6149

Une conférence

Une conférence2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Pays/TerritoireCorée du Sud
La villeSeoul
période14/04/2419/04/24

Empreinte digitale

Examiner les sujets de recherche de « GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation