Abstract
We review in a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation (Charpentier and Moulines, 1988; Moulines and Charpentier, 1988; Hamon et al., 1989). These algorithms rely on a pitch-synchronous overlap-add (PSOLA) approach for modifying the speech prosody and concatenating speech waveforms. The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA), using the Fast Fourier Transform, or directly in the time domain (TD-PSOLA), depending on the length of the window used in the synthesis process. The frequency domain approach is capable of a great flexibility in modifying the spectral characteristics of the speech signal, while the time domain approach provides very efficient solutions for the real time implementation of synthesis systems. We also discuss the different kinds of distortions involved in these different algorithms.
| Original language | English |
|---|---|
| Pages (from-to) | 453-467 |
| Number of pages | 15 |
| Journal | Speech Communication |
| Volume | 9 |
| Issue number | 5-6 |
| DOIs | |
| Publication status | Published - 1 Jan 1990 |
| Externally published | Yes |
Keywords
- Text-to-speech synthesis
- pitch-synchronous overlap-aid (PSOLA)
- voice quality
Fingerprint
Dive into the research topics of 'Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver