A real-time French text-to-speech system generating high-quality synthetic speech

  • E. Moulines
  • , F. Emerard
  • , D. Larreur
  • , J. L. Le Saint Milon
  • , L. Le Faucheur
  • , F. Marty
  • , F. Charpentier
  • , C. Sorin

Research output: Contribution to journalConference articlepeer-review

Abstract

The main features of the CNET diphone-based text-to-speech system for French language are described. The linguistic analysis works in three steps. First, a morphosyntactic analysis module assigns a grammatical value to each word in the text and transcribes it phonetically. A second module parses the text into hierarchical syntactico-prosodic groups. Finally, prosodic patterns are automatically assigned to each word by queries to a database of prosodic events. The phonetic and prosodic information serves as commands to the synthesis component. The synthesis component is based on diphone concatenation. A time-domain formulation of the pitch-synchronous overlap-add scheme (TD-PSOLA) is used to modify the speech prosody and to concatenate diphone waveforms. It is combined with a low bit-rate speech decoder to reduce the memory requirement for storing the diphone inventory. The system runs in real time on a PC equipped with a TMS320C25 DSP board and provides notably improved sound quality and naturalness in comparison to commercially available systems.

Original languageEnglish
Pages (from-to)309-312
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
Publication statusPublished - 1 Dec 1990
Externally publishedYes
Event1990 International Conference on Acoustics, Speech, and Signal Processing: Speech Processing 2, VLSI, Audio and Electroacoustics Part 2 (of 5) - Albuquerque, New Mexico, USA
Duration: 3 Apr 19906 Apr 1990

Fingerprint

Dive into the research topics of 'A real-time French text-to-speech system generating high-quality synthetic speech'. Together they form a unique fingerprint.

Cite this