TY - GEN
T1 - TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task
AU - Shammary, Fouad
AU - Chen, Yiyi
AU - Kardkovács, Zsolt T.
AU - Afli, Haithem
AU - Alam, Mehwish
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1: the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.
AB - This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1: the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.
UR - https://www.scopus.com/pages/publications/85141302540
U2 - 10.18653/v1/2022.wanlp-1.42
DO - 10.18653/v1/2022.wanlp-1.42
M3 - Conference contribution
AN - SCOPUS:85141302540
T3 - WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
SP - 420
EP - 424
BT - WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022
Y2 - 8 December 2022
ER -