Abstract
The main features of the CNET diphone-based text-to-speech system for French language are described. The linguistic analysis works in three steps. First, a morphosyntactic analysis module assigns a grammatical value to each word in the text and transcribes it phonetically. A second module parses the text into hierarchical syntactico-prosodic groups. Finally, prosodic patterns are automatically assigned to each word by queries to a database of prosodic events. The phonetic and prosodic information serves as commands to the synthesis component. The synthesis component is based on diphone concatenation. A time-domain formulation of the pitch-synchronous overlap-add scheme (TD-PSOLA) is used to modify the speech prosody and to concatenate diphone waveforms. It is combined with a low bit-rate speech decoder to reduce the memory requirement for storing the diphone inventory. The system runs in real time on a PC equipped with a TMS320C25 DSP board and provides notably improved sound quality and naturalness in comparison to commercially available systems.
| Original language | English |
|---|---|
| Pages (from-to) | 309-312 |
| Number of pages | 4 |
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| Volume | 1 |
| Publication status | Published - 1 Dec 1990 |
| Externally published | Yes |
| Event | 1990 International Conference on Acoustics, Speech, and Signal Processing: Speech Processing 2, VLSI, Audio and Electroacoustics Part 2 (of 5) - Albuquerque, New Mexico, USA Duration: 3 Apr 1990 → 6 Apr 1990 |