TY - GEN
T1 - A Mixed Audio-Video SPD Network for Online Classification of Parkinsonian Speech Patterns
AU - Archila, John
AU - Manzanera, Antoine
AU - Martínez, Fabio
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Parkinson’s disease (PD) is a neurodegenerative disease that produces progressive motor impairments. Dysarthria (speech disorders) and hypomimia (face rigidity) are two major Parkinsonism patterns observed even at the early stages of the disease. Nonetheless, the clinical diagnosis is mainly observational and dependent on the specialists’ expertise. Besides, the categorization of each of these patterns is isolated, which may lead to delayed diagnosis and misplanning of treatments. This work introduces a non-invasive multimodal strategy that integrates video and audio modalities into the online characterization of speech exercises. Subjects were invited to pronounce sustained vowels while video and audio were recorded. Then, a temporal window is run along the sequence to build online covariance matrices of synchronized face landmarks position and characteristic voice frequencies. From these temporal covariance matrices are learned Riemannian descriptors that allow to discriminate between Parkinson’s and control subjects. From a study with 14 subjects, the proposed approach achieved a mean accuracy of 70% in sustained vowel pronunciation. Considering online predictions, the proposed approach evidenced a consistent accuracy of 0.77 during pronunciation of close vowels.
AB - Parkinson’s disease (PD) is a neurodegenerative disease that produces progressive motor impairments. Dysarthria (speech disorders) and hypomimia (face rigidity) are two major Parkinsonism patterns observed even at the early stages of the disease. Nonetheless, the clinical diagnosis is mainly observational and dependent on the specialists’ expertise. Besides, the categorization of each of these patterns is isolated, which may lead to delayed diagnosis and misplanning of treatments. This work introduces a non-invasive multimodal strategy that integrates video and audio modalities into the online characterization of speech exercises. Subjects were invited to pronounce sustained vowels while video and audio were recorded. Then, a temporal window is run along the sequence to build online covariance matrices of synchronized face landmarks position and characteristic voice frequencies. From these temporal covariance matrices are learned Riemannian descriptors that allow to discriminate between Parkinson’s and control subjects. From a study with 14 subjects, the proposed approach achieved a mean accuracy of 70% in sustained vowel pronunciation. Considering online predictions, the proposed approach evidenced a consistent accuracy of 0.77 during pronunciation of close vowels.
KW - Mixed audio-video SPD networks
KW - online Parkinson’s Disease prediction
UR - https://www.scopus.com/pages/publications/86000451519
U2 - 10.1007/978-3-031-80366-6_10
DO - 10.1007/978-3-031-80366-6_10
M3 - Conference contribution
AN - SCOPUS:86000451519
SN - 9783031803659
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 110
EP - 121
BT - Advances in Artificial Intelligence – IBERAMIA 2024 - 18th Ibero-American Conference on AI, Proceedings
A2 - Correia, Luís
A2 - Rosá, Aiala
A2 - Garijo, Francisco
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th Ibero-American Conference on Artificial Intelligence, IBERAMIA 2024
Y2 - 13 November 2024 through 15 November 2024
ER -