TY - GEN
T1 - Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
AU - Sankaran, Aditya Narayan
AU - Farahbakhsh, Reza
AU - Crespi, Noel
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.
AB - Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.
UR - https://www.scopus.com/pages/publications/85218488176
M3 - Conference contribution
AN - SCOPUS:85218488176
T3 - Proceedings - International Conference on Computational Linguistics, COLING
SP - 5558
EP - 5569
BT - Main Conference
A2 - Rambow, Owen
A2 - Wanner, Leo
A2 - Apidianaki, Marianna
A2 - Al-Khalifa, Hend
A2 - Di Eugenio, Barbara
A2 - Schockaert, Steven
PB - Association for Computational Linguistics (ACL)
T2 - 31st International Conference on Computational Linguistics, COLING 2025
Y2 - 19 January 2025 through 24 January 2025
ER -