TY - GEN
T1 - First Attempt of Gender-free Speech Style Transfer for Genderless Robot
AU - Yu, Chuang
AU - Fu, Changzeng
AU - Chen, Rui
AU - Tapus, Adriana
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Some robots for human-robot interaction are designed with female or male physical appearance. Other robots are endowed with no gender characteristics, namely genderless robots, such as Pepper and NAO robot. A robot with male or female physical appearance should possess the mapped speech gender style during a natural human-robot interaction, which can be learned from humans' male or female speech. In this paper, we make a new trial to synthesis gender-free speeches for physically genderless robots, which is promising in order to improve a more natural human-robot interaction with genderless robots. Our gender style-controlled speech synthesizer takes the speech text and gender style embedding as inputs to generate speech audio. A speech gender encoder network is used to extract the embedding of the speech gender style with female and male speeches as input. Based on the distribution of the female and male gender style embedding, we explore the gender-free speech style embedding space where we sample some gender-free embedding vectors to generate genderless speech audio. This is a preliminary work where we show how the genderless speech audio wave will be synthesized from text.
AB - Some robots for human-robot interaction are designed with female or male physical appearance. Other robots are endowed with no gender characteristics, namely genderless robots, such as Pepper and NAO robot. A robot with male or female physical appearance should possess the mapped speech gender style during a natural human-robot interaction, which can be learned from humans' male or female speech. In this paper, we make a new trial to synthesis gender-free speeches for physically genderless robots, which is promising in order to improve a more natural human-robot interaction with genderless robots. Our gender style-controlled speech synthesizer takes the speech text and gender style embedding as inputs to generate speech audio. A speech gender encoder network is used to extract the embedding of the speech gender style with female and male speeches as input. Based on the distribution of the female and male gender style embedding, we explore the gender-free speech style embedding space where we sample some gender-free embedding vectors to generate genderless speech audio. This is a preliminary work where we show how the genderless speech audio wave will be synthesized from text.
KW - genderless robot speech
KW - speech style transfer
KW - text-to-speech synthesis
U2 - 10.1109/HRI53351.2022.9889533
DO - 10.1109/HRI53351.2022.9889533
M3 - Conference contribution
AN - SCOPUS:85140729133
T3 - ACM/IEEE International Conference on Human-Robot Interaction
SP - 1110
EP - 1113
BT - HRI 2022 - Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction
PB - IEEE Computer Society
T2 - 17th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2022
Y2 - 7 March 2022 through 10 March 2022
ER -