TY - GEN
T1 - AnCoGen
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
AU - Sadok, Samir
AU - Leglaive, Simon
AU - Girin, Laurent
AU - Richard, Gaël
AU - Alameda-Pineda, Xavier
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysis-resynthesis, pitch estimation, pitch modification, and speech enhancement. Code and audio examples are available online.
AB - This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysis-resynthesis, pitch estimation, pitch modification, and speech enhancement. Code and audio examples are available online.
KW - Speech analysis/transformation/synthesis
KW - masked autoencoder
KW - pitch estimation and modification
KW - speech enhancement
UR - https://www.scopus.com/pages/publications/105003876228
U2 - 10.1109/ICASSP49660.2025.10887856
DO - 10.1109/ICASSP49660.2025.10887856
M3 - Conference contribution
AN - SCOPUS:105003876228
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 April 2025 through 11 April 2025
ER -