Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

Khouloud Mnassri, Reza Farahbakhsh, Noel Crespi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Online communication has overcome linguistic and cultural barriers, enabling global connection through social media platforms. However, linguistic variety introduced more challenges in tasks such as the detection of hate speech content. Although multiple NLP solutions were proposed using advanced machine learning techniques, data annotation scarcity is still a serious problem urging the need for employing semi-supervised approaches. This paper proposes an innovative solution—a multilingual Semi-Supervised model based on Generative Adversarial Networks (GAN) and mBERT models, namely SS-GAN-mBERT. We managed to detect hate speech in Indo-European languages (in English, German, and Hindi) using only 20% labeled data from the HASOC2019 dataset. Our approach excelled in multilingual, zero-shot cross-lingual, and monolingual paradigms, achieving, on average, a 9.23% F1 score boost and 5.75% accuracy increase over baseline mBERT model.

Original languageEnglish
Title of host publicationComplex Networks and Their Applications XII - Proceedings of The 12th International Conference on Complex Networks and their Applications
Subtitle of host publicationCOMPLEX NETWORKS 2023
EditorsHocine Cherifi, Luis M. Rocha, Chantal Cherifi, Murat Donduran
PublisherSpringer Science and Business Media Deutschland GmbH
Pages192-204
Number of pages13
ISBN (Print)9783031535024
DOIs
Publication statusPublished - 1 Jan 2024
Event12th International Conference on Complex Networks and their Applications, COMPLEX NETWORKS 2023 - Menton, France
Duration: 28 Nov 202330 Nov 2023

Publication series

NameStudies in Computational Intelligence
Volume1144 SCI
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Conference

Conference12th International Conference on Complex Networks and their Applications, COMPLEX NETWORKS 2023
Country/TerritoryFrance
CityMenton
Period28/11/2330/11/23

Keywords

  • GAN
  • Hate Speech
  • mBERT
  • multilingual
  • offensive language
  • semi-supervised
  • social media

Fingerprint

Dive into the research topics of 'Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network'. Together they form a unique fingerprint.

Cite this