Voice transformation using PSOLA technique

Research output: Contribution to journalArticlepeer-review

Abstract

In this contribution, a new system for voice conversion is described. The proposed architecture combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation. The synthesizer based on the classical source-filter decomposition allows prosodic and spectral transformations to be performed independently. Prosodic modifications are applied on the excitation signal using the TD-PSOLA scheme; converted speech is then synthesized using the transformed spectral parameters. Two different approaches to derive spectral transformations, borrowed from the speech-recognition domain, are compared: Linear Multivariate Regression (LMR) and Dynamic Frequency Warping (DFW). Vector-quantization is carried out as a preliminary stage to render the spectral transformations dependent of the acoustical realization of sounds. A formal listening test shows that the synthesizer produces a satisfyingly natural "transformed" voice. LMR proves yet to allow a slightly better conversion than DFW. Still there is room for improvement in the spectral transformation stage.

Original languageEnglish
Pages (from-to)175-187
Number of pages13
JournalSpeech Communication
Volume11
Issue number2-3
DOIs
Publication statusPublished - 1 Jan 1992

Keywords

  • PSOLA analysis-synthesis
  • dynamic frequency warping
  • linear multivariate regression
  • voice conversion

Fingerprint

Dive into the research topics of 'Voice transformation using PSOLA technique'. Together they form a unique fingerprint.

Cite this