Passer à la navigation principale Passer à la recherche Passer au contenu principal

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

  • Teysir Baoueb
  • , Haocheng Liu
  • , Mathieu Fontaine
  • , Jonathan Le Roux
  • , Gael Richard

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Generative adversarial network (GAN) models can synthesize high-quality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the training stability is enhanced by means of a forward diffusion process which consists in injecting noise from a Gaussian distribution to both real and fake samples before inputting them to the discriminator. We further improve the model by exploiting a spectrally-shaped noise distribution with the aim to make the discriminator's task more challenging. We then show the merits of our proposed model for speech and music synthesis on several datasets. Our experiments confirm that our model compares favorably in audio quality and efficiency compared to several baselines.

langue originaleAnglais
titre2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages986-990
Nombre de pages5
ISBN (Electronique)9798350344851
Les DOIs
étatPublié - 1 janv. 2024
Evénement2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Corée du Sud
Durée: 14 avr. 202419 avr. 2024

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (imprimé)1520-6149

Une conférence

Une conférence2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Pays/TerritoireCorée du Sud
La villeSeoul
période14/04/2419/04/24

Empreinte digitale

Examiner les sujets de recherche de « SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation