Abstract
A novel time-domain algorithm is presented for text-to-speech synthesis using diphone concatenation. The algorithm is based on the pitch-synchronous overlap-add (PSOLA) approach and is capable of good quality prosodic modifications of natural speech. The algorithm can be seen as a simplification of a previous algorithm combining the PSOLA approach and frequency-domain transformations. On the other hand, it appears as a generalization of previous time-domain methods that perform pitch synchronous cut-and-splice operations on the speech waveform. This algorithm is used in the CNET diphone synthesis multilingual system, actually supporting three languages: French, Italian, and German. The resulting speech has been tested on French and is judged of much better quality than for an LPC-based synthesizer.
| Original language | English |
|---|---|
| Pages (from-to) | 238-241 |
| Number of pages | 4 |
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| Volume | 1 |
| Publication status | Published - 1 Dec 1989 |
| Externally published | Yes |
| Event | 1989 International Conference on Acoustics, Speech, and Signal Processing - Glasgow, Scotland Duration: 23 May 1989 → 26 May 1989 |