Enhancing Multilingual Fake News Detection Through LLM-Based Data Augmentation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid growth of online news consumption has intensified the spread of misinformation, underscoring the critical need for effective fake news detection methods. Despite significant advancements in this area, the scarcity and inadequacy of high-quality labeled datasets necessary for training effective detection models remains a major challenge. In this paper, we introduce a novel approach to address this issue by leveraging large language models (LLMs) for data augmentation. Specifically, we employ Llama 3 to generate multiple synthetic news samples per original article, enriching existing fake news datasets to enhance fake news detection. We explore various augmentation strategies like different augmentation rates, random or similarity-based subsampling, and selectively augmenting data from specific classes to optimize the augmented datasets to train better classifiers. We evaluate the efficacy of our approach using BERT-based classifiers on two multilingual datasets. Our findings reveal notable improvements particularly when augmenting only the fake class with rate 1.

Original languageEnglish
Title of host publicationComplex Networks and Their Applications XIII - Proceedings of The 13th International Conference on Complex Networks and Their Applications
Subtitle of host publicationCOMPLEX NETWORKS 2024 - Volume 3
EditorsHocine Cherifi, Murat Donduran, Luis M. Rocha, Chantal Cherifi, Onur Varol
PublisherSpringer Science and Business Media Deutschland GmbH
Pages258-270
Number of pages13
ISBN (Print)9783031824340
DOIs
Publication statusPublished - 1 Jan 2025
Event13th International Conference on Complex Networks and their Applications, COMPLEX NETWORKS 2024 - Istanbul, Turkey
Duration: 10 Dec 202412 Dec 2024

Publication series

NameStudies in Computational Intelligence
Volume1189 SCI
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Conference

Conference13th International Conference on Complex Networks and their Applications, COMPLEX NETWORKS 2024
Country/TerritoryTurkey
CityIstanbul
Period10/12/2412/12/24

Keywords

  • Data Augmentation
  • Large Language Models (LLMs)
  • Multilingual Fake News Detection

Fingerprint

Dive into the research topics of 'Enhancing Multilingual Fake News Detection Through LLM-Based Data Augmentation'. Together they form a unique fingerprint.

Cite this