Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Research output: Contribution to journalArticlepeer-review

Abstract

We review in a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation (Charpentier and Moulines, 1988; Moulines and Charpentier, 1988; Hamon et al., 1989). These algorithms rely on a pitch-synchronous overlap-add (PSOLA) approach for modifying the speech prosody and concatenating speech waveforms. The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA), using the Fast Fourier Transform, or directly in the time domain (TD-PSOLA), depending on the length of the window used in the synthesis process. The frequency domain approach is capable of a great flexibility in modifying the spectral characteristics of the speech signal, while the time domain approach provides very efficient solutions for the real time implementation of synthesis systems. We also discuss the different kinds of distortions involved in these different algorithms.

Original languageEnglish
Pages (from-to)453-467
Number of pages15
JournalSpeech Communication
Volume9
Issue number5-6
DOIs
Publication statusPublished - 1 Jan 1990
Externally publishedYes

Keywords

  • Text-to-speech synthesis
  • pitch-synchronous overlap-aid (PSOLA)
  • voice quality

Fingerprint

Dive into the research topics of 'Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones'. Together they form a unique fingerprint.

Cite this