Abstract
FFT (fast Fourier transform) synthesis algorithms for a French text-to-speech system based on diphone concatenation are presented. FFT synthesis techniques are capable of producing high-quality prosodic modifications of natural speech. Several approaches are presented here to reduce the distortions due to diphone concatenation. They are based on appropriate manipulations of the phase spectrum, either by phase equalization across all the diphones, or by phase smoothing between successive diphones. The resulting speech is of significantly better quality than with conventional linear predictive coding (LPC) synthesis. An experiment was run to reduce the computational cost by performing all the FFTs offline. The resulting speech is slightly degraded with respect to full FFT-synthesized speech, but it remains more natural in comparison with the LPC speech.
| Original language | English |
|---|---|
| Pages (from-to) | 667-670 |
| Number of pages | 4 |
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| Publication status | Published - 1 Jan 1988 |
| Externally published | Yes |