TY - JOUR
T1 - Detection of Early Parkinson's Disease by Leveraging Speech Foundation Models
AU - Dao, Quang
AU - Jeancolas, Laetitia
AU - Mangone, Graziella
AU - Sambin, Sara
AU - Chalançon, Alizé
AU - Gomes, Manon
AU - Lehéricy, Stéphane
AU - Corvol, Jean Christophe
AU - Vidailhet, Marie
AU - Arnulf, Isabelle
AU - Delacrétaz, Dijana Petrovska
AU - El-Yacoubi, Mounîm A.
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Parkinson's disease (PD) is a progressive neurodegenerative disorder affecting millions worldwide, characterized by a wide range of motor and non-motor symptoms. Among these symptoms, alterations in speech and voice quality stand out as early and prominent indicators of the disease. Recently, the emergence of speech foundation models has revolutionized the field by providing powerful tools for speech processing and feature extraction. In this article, we investigate the capabilities of three state-of the art speech foundation models, wav2vec2.0, Whisper and SeamlessM4T, to develop robust and accurate methods for PD detection from voice recordings. We experiment with both direct feature extraction and finetuning of the foundation models for the PD classification task, and validate the results against clinical and neuroimaging data. We achieve promising results using both pretrained features and models' finetuning, with finetuning providing stronger performance, up to 91.35% for AUC, which is the new state of the art on the ICEBERG dataset. The predictions of our models also show good correlation with clinical as well as DaTSCAN scores, proving the feasibility to apply speech foundation models for detection of early PD.
AB - Parkinson's disease (PD) is a progressive neurodegenerative disorder affecting millions worldwide, characterized by a wide range of motor and non-motor symptoms. Among these symptoms, alterations in speech and voice quality stand out as early and prominent indicators of the disease. Recently, the emergence of speech foundation models has revolutionized the field by providing powerful tools for speech processing and feature extraction. In this article, we investigate the capabilities of three state-of the art speech foundation models, wav2vec2.0, Whisper and SeamlessM4T, to develop robust and accurate methods for PD detection from voice recordings. We experiment with both direct feature extraction and finetuning of the foundation models for the PD classification task, and validate the results against clinical and neuroimaging data. We achieve promising results using both pretrained features and models' finetuning, with finetuning providing stronger performance, up to 91.35% for AUC, which is the new state of the art on the ICEBERG dataset. The predictions of our models also show good correlation with clinical as well as DaTSCAN scores, proving the feasibility to apply speech foundation models for detection of early PD.
KW - Parkinson's Disease
KW - clinical scores
KW - early detection
KW - neuroimaging
KW - speech foundation models
KW - voice analysis
UR - https://www.scopus.com/pages/publications/86000795292
U2 - 10.1109/JBHI.2025.3548917
DO - 10.1109/JBHI.2025.3548917
M3 - Article
AN - SCOPUS:86000795292
SN - 2168-2194
VL - 29
SP - 5181
EP - 5190
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 7
ER -