Abstract
In this contribution, a new system for voice conversion is described. The proposed architecture combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation. The synthesizer based on the classical source-filter decomposition allows prosodic and spectral transformations to be performed independently. Prosodic modifications are applied on the excitation signal using the TD-PSOLA scheme; converted speech is then synthesized using the transformed spectral parameters. Two different approaches to derive spectral transformations, borrowed from the speech-recognition domain, are compared: Linear Multivariate Regression (LMR) and Dynamic Frequency Warping (DFW). Vector-quantization is carried out as a preliminary stage to render the spectral transformations dependent of the acoustical realization of sounds. A formal listening test shows that the synthesizer produces a satisfyingly natural "transformed" voice. LMR proves yet to allow a slightly better conversion than DFW. Still there is room for improvement in the spectral transformation stage.
| Original language | English |
|---|---|
| Pages (from-to) | 175-187 |
| Number of pages | 13 |
| Journal | Speech Communication |
| Volume | 11 |
| Issue number | 2-3 |
| DOIs | |
| Publication status | Published - 1 Jan 1992 |
Keywords
- PSOLA analysis-synthesis
- dynamic frequency warping
- linear multivariate regression
- voice conversion