TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task

  • Fouad Shammary
  • , Yiyi Chen
  • , Zsolt T. Kardkovács
  • , Haithem Afli
  • , Mehwish Alam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1: the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.

Original languageEnglish
Title of host publicationWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages420-424
Number of pages5
ISBN (Electronic)9781959429272
DOIs
Publication statusPublished - 1 Jan 2022
Externally publishedYes
Event7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 8 Dec 2022 → …

Publication series

NameWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop

Conference

Conference7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period8/12/22 → …

Fingerprint

Dive into the research topics of 'TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task'. Together they form a unique fingerprint.

Cite this