Passer à la navigation principale Passer à la recherche Passer au contenu principal

Addressing data scarcity in multilingual fake news detection: an LLM-based dataset augmentation approach

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

The rise in online news consumption, especially during critical events, coupled with rapid advances in generative artificial intelligence (AI), has accelerated the spread of misinformation, underscoring the urgent need for fast and effective fake news detection approaches. However, the scarcity and imbalance of high-quality labeled datasets pose significant challenges to training accurate and reliable detection models. In this study, we tackle this issue by leveraging Large Language Models (LLMs) for data augmentation. Expanding upon our prior work, we employ Llama 3 to generate synthetic news samples under zero-shot and few-shot settings, enriching existing fake news datasets to improve the performance of detection models. To optimize augmentation effectiveness, we explore several strategies, including varying augmentation rates, random versus similarity-based subsampling, and class-specific augmentation. Our experiments, using BERT-based classifiers on two real-world multilingual datasets, reveal that selectively augmenting only the fake news class at lower rates typically yields the most consistent improvements, with similarity-based subsampling slightly outperforming random selection. The augmentation approach led to F1 score improvements of up to 7.7 points in some languages. Additionally, while few-shot-generated samples generally exhibit greater similarity to the original ones, their impact on classification remains inconsistent. These findings highlight the potential of LLM-driven data augmentation, when carefully tuned, to enhance fake news detection.

langue originaleAnglais
Numéro d'article92
journalSocial Network Analysis and Mining
Volume15
Numéro de publication1
Les DOIs
étatPublié - 1 déc. 2025

Empreinte digitale

Examiner les sujets de recherche de « Addressing data scarcity in multilingual fake news detection: an LLM-based dataset augmentation approach ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation