TY - GEN
T1 - Language-Agnostic Method for Sentiment Analysis of Twitter
AU - Jafari, Amir Reza
AU - Farahbakhsh, Reza
AU - Salehi, Mostafa
AU - Crespi, Noel
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2024.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - With the different events and crises that we are witnessing these days, Twitter plays an essential role in sharing thoughts, opinions, and news worldwide in various languages. Understanding the sentiment of user-generated content has garnered much interest in both industrial and academic communities in recent studies. Due to the limited availability of data from low-resource languages, the focus on multilingual resources is a limiting and challenging issue of sentiment analysis task. Considering the importance of pre-processing in the implementation of a sentiment analysis system, we propose a method consisting of two steps for the pre-processing of tweets in different languages i) a language-agnostic step to replace or remove some elements in the Twitter data structure and ii) a text-normalization step based on the main high-resource language. In addition, we used machine translation techniques to translate low-resource language texts into the main language. We evaluated sentiment classification approaches based on four deep models: an RNN model and three BERT-based architectures, namely vanilla-version, a language-specific, and a large-scale pre-trained model for Twitter. The results show that our method had better accuracy when using a large-scale BERT-based pre-trained model.
AB - With the different events and crises that we are witnessing these days, Twitter plays an essential role in sharing thoughts, opinions, and news worldwide in various languages. Understanding the sentiment of user-generated content has garnered much interest in both industrial and academic communities in recent studies. Due to the limited availability of data from low-resource languages, the focus on multilingual resources is a limiting and challenging issue of sentiment analysis task. Considering the importance of pre-processing in the implementation of a sentiment analysis system, we propose a method consisting of two steps for the pre-processing of tweets in different languages i) a language-agnostic step to replace or remove some elements in the Twitter data structure and ii) a text-normalization step based on the main high-resource language. In addition, we used machine translation techniques to translate low-resource language texts into the main language. We evaluated sentiment classification approaches based on four deep models: an RNN model and three BERT-based architectures, namely vanilla-version, a language-specific, and a large-scale pre-trained model for Twitter. The results show that our method had better accuracy when using a large-scale BERT-based pre-trained model.
KW - BERT-based approaches
KW - Low resource languages
KW - NLP
KW - Sentiment analysis
KW - Twitter
U2 - 10.1007/978-981-99-6547-2_46
DO - 10.1007/978-981-99-6547-2_46
M3 - Conference contribution
AN - SCOPUS:85181976075
SN - 9789819965465
T3 - Lecture Notes in Networks and Systems
SP - 597
EP - 606
BT - Proceedings of Data Analytics and Management - ICDAM 2023
A2 - Swaroop, Abhishek
A2 - Polkowski, Zdzislaw
A2 - Correia, Sérgio Duarte
A2 - Virdee, Bal
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Conference on Data Analytics and Management, ICDAM 2023
Y2 - 23 June 2023 through 24 June 2023
ER -