TY - GEN
T1 - Evaluation of Feature-Embedding Methods for Word Spotting in Historical Arabic Documents
AU - Fathallah, Abir
AU - Khedher, Mohamed Ibn
AU - El-Yacoubi, Mounim A.
AU - Ben Amara, Najoua Essoukri
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7/20
Y1 - 2020/7/20
N2 - Retrieving and indexing historical Arabic documents remain a very significant challenge. The purpose of this paper is to compare the feature representation spaces for word spotting in historical Arabic documents. Our goal is to create embedding spaces using the characteristics of different machine learning methods: i) linear such as principal component analysis and linear discriminant analysis, and ii) non-linear including convolutional neural networks for triplets and Siamese. Subsequently, each word image is represented by a dense vector. Thus, to match feature representations, a Euclidean distance is used. An evaluation of various representation space models is presented. The embedding word models are evaluated on the VML-HD dataset, and the experiments show the effectiveness of non-linear methods compared to linear ones.
AB - Retrieving and indexing historical Arabic documents remain a very significant challenge. The purpose of this paper is to compare the feature representation spaces for word spotting in historical Arabic documents. Our goal is to create embedding spaces using the characteristics of different machine learning methods: i) linear such as principal component analysis and linear discriminant analysis, and ii) non-linear including convolutional neural networks for triplets and Siamese. Subsequently, each word image is represented by a dense vector. Thus, to match feature representations, a Euclidean distance is used. An evaluation of various representation space models is presented. The embedding word models are evaluated on the VML-HD dataset, and the experiments show the effectiveness of non-linear methods compared to linear ones.
KW - Feature embedding
KW - Historical Arabic documents
KW - Word spotting
U2 - 10.1109/SSD49366.2020.9364134
DO - 10.1109/SSD49366.2020.9364134
M3 - Conference contribution
AN - SCOPUS:85103008342
T3 - Proceedings of the 17th International Multi-Conference on Systems, Signals and Devices, SSD 2020
SP - 34
EP - 39
BT - Proceedings of the 17th International Multi-Conference on Systems, Signals and Devices, SSD 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th International Multi-Conference on Systems, Signals and Devices, SSD 2020
Y2 - 20 July 2020 through 23 July 2020
ER -