Abstract
The rise in online news consumption, especially during critical events, coupled with rapid advances in generative artificial intelligence (AI), has accelerated the spread of misinformation, underscoring the urgent need for fast and effective fake news detection approaches. However, the scarcity and imbalance of high-quality labeled datasets pose significant challenges to training accurate and reliable detection models. In this study, we tackle this issue by leveraging Large Language Models (LLMs) for data augmentation. Expanding upon our prior work, we employ Llama 3 to generate synthetic news samples under zero-shot and few-shot settings, enriching existing fake news datasets to improve the performance of detection models. To optimize augmentation effectiveness, we explore several strategies, including varying augmentation rates, random versus similarity-based subsampling, and class-specific augmentation. Our experiments, using BERT-based classifiers on two real-world multilingual datasets, reveal that selectively augmenting only the fake news class at lower rates typically yields the most consistent improvements, with similarity-based subsampling slightly outperforming random selection. The augmentation approach led to F1 score improvements of up to 7.7 points in some languages. Additionally, while few-shot-generated samples generally exhibit greater similarity to the original ones, their impact on classification remains inconsistent. These findings highlight the potential of LLM-driven data augmentation, when carefully tuned, to enhance fake news detection.
| Original language | English |
|---|---|
| Article number | 92 |
| Journal | Social Network Analysis and Mining |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 1 Dec 2025 |
Keywords
- Data augmentation
- Fake news detection
- Few-shot and zero-shot prompting
- Large language models (LLMs)
- Misinformation detection
Fingerprint
Dive into the research topics of 'Addressing data scarcity in multilingual fake news detection: an LLM-based dataset augmentation approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver