TY - GEN
T1 - Enhancing Multilingual Fake News Detection Through LLM-Based Data Augmentation
AU - Chalehchaleh, Razieh
AU - Farahbakhsh, Reza
AU - Crespi, Noel
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - The rapid growth of online news consumption has intensified the spread of misinformation, underscoring the critical need for effective fake news detection methods. Despite significant advancements in this area, the scarcity and inadequacy of high-quality labeled datasets necessary for training effective detection models remains a major challenge. In this paper, we introduce a novel approach to address this issue by leveraging large language models (LLMs) for data augmentation. Specifically, we employ Llama 3 to generate multiple synthetic news samples per original article, enriching existing fake news datasets to enhance fake news detection. We explore various augmentation strategies like different augmentation rates, random or similarity-based subsampling, and selectively augmenting data from specific classes to optimize the augmented datasets to train better classifiers. We evaluate the efficacy of our approach using BERT-based classifiers on two multilingual datasets. Our findings reveal notable improvements particularly when augmenting only the fake class with rate 1.
AB - The rapid growth of online news consumption has intensified the spread of misinformation, underscoring the critical need for effective fake news detection methods. Despite significant advancements in this area, the scarcity and inadequacy of high-quality labeled datasets necessary for training effective detection models remains a major challenge. In this paper, we introduce a novel approach to address this issue by leveraging large language models (LLMs) for data augmentation. Specifically, we employ Llama 3 to generate multiple synthetic news samples per original article, enriching existing fake news datasets to enhance fake news detection. We explore various augmentation strategies like different augmentation rates, random or similarity-based subsampling, and selectively augmenting data from specific classes to optimize the augmented datasets to train better classifiers. We evaluate the efficacy of our approach using BERT-based classifiers on two multilingual datasets. Our findings reveal notable improvements particularly when augmenting only the fake class with rate 1.
KW - Data Augmentation
KW - Large Language Models (LLMs)
KW - Multilingual Fake News Detection
UR - https://www.scopus.com/pages/publications/105002048436
U2 - 10.1007/978-3-031-82435-7_21
DO - 10.1007/978-3-031-82435-7_21
M3 - Conference contribution
AN - SCOPUS:105002048436
SN - 9783031824340
T3 - Studies in Computational Intelligence
SP - 258
EP - 270
BT - Complex Networks and Their Applications XIII - Proceedings of The 13th International Conference on Complex Networks and Their Applications
A2 - Cherifi, Hocine
A2 - Donduran, Murat
A2 - Rocha, Luis M.
A2 - Cherifi, Chantal
A2 - Varol, Onur
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th International Conference on Complex Networks and their Applications, COMPLEX NETWORKS 2024
Y2 - 10 December 2024 through 12 December 2024
ER -