Skip to main navigation Skip to search Skip to main content

Language-Agnostic Method for Sentiment Analysis of Twitter

  • Amir Reza Jafari
  • , Reza Farahbakhsh
  • , Mostafa Salehi
  • , Noel Crespi
  • Telecom Sudparis
  • University of Tehran

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the different events and crises that we are witnessing these days, Twitter plays an essential role in sharing thoughts, opinions, and news worldwide in various languages. Understanding the sentiment of user-generated content has garnered much interest in both industrial and academic communities in recent studies. Due to the limited availability of data from low-resource languages, the focus on multilingual resources is a limiting and challenging issue of sentiment analysis task. Considering the importance of pre-processing in the implementation of a sentiment analysis system, we propose a method consisting of two steps for the pre-processing of tweets in different languages i) a language-agnostic step to replace or remove some elements in the Twitter data structure and ii) a text-normalization step based on the main high-resource language. In addition, we used machine translation techniques to translate low-resource language texts into the main language. We evaluated sentiment classification approaches based on four deep models: an RNN model and three BERT-based architectures, namely vanilla-version, a language-specific, and a large-scale pre-trained model for Twitter. The results show that our method had better accuracy when using a large-scale BERT-based pre-trained model.

Original languageEnglish
Title of host publicationProceedings of Data Analytics and Management - ICDAM 2023
EditorsAbhishek Swaroop, Zdzislaw Polkowski, Sérgio Duarte Correia, Bal Virdee
PublisherSpringer Science and Business Media Deutschland GmbH
Pages597-606
Number of pages10
ISBN (Print)9789819965465
DOIs
Publication statusPublished - 1 Jan 2024
EventInternational Conference on Data Analytics and Management, ICDAM 2023 - Jelenia Gora, Poland
Duration: 23 Jun 202324 Jun 2023

Publication series

NameLecture Notes in Networks and Systems
Volume786
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceInternational Conference on Data Analytics and Management, ICDAM 2023
Country/TerritoryPoland
CityJelenia Gora
Period23/06/2324/06/23

Keywords

  • BERT-based approaches
  • Low resource languages
  • NLP
  • Sentiment analysis
  • Twitter

Fingerprint

Dive into the research topics of 'Language-Agnostic Method for Sentiment Analysis of Twitter'. Together they form a unique fingerprint.

Cite this