Evaluation of Feature-Embedding Methods for Word Spotting in Historical Arabic Documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Retrieving and indexing historical Arabic documents remain a very significant challenge. The purpose of this paper is to compare the feature representation spaces for word spotting in historical Arabic documents. Our goal is to create embedding spaces using the characteristics of different machine learning methods: i) linear such as principal component analysis and linear discriminant analysis, and ii) non-linear including convolutional neural networks for triplets and Siamese. Subsequently, each word image is represented by a dense vector. Thus, to match feature representations, a Euclidean distance is used. An evaluation of various representation space models is presented. The embedding word models are evaluated on the VML-HD dataset, and the experiments show the effectiveness of non-linear methods compared to linear ones.

Original languageEnglish
Title of host publicationProceedings of the 17th International Multi-Conference on Systems, Signals and Devices, SSD 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages34-39
Number of pages6
ISBN (Electronic)9781728110806
DOIs
Publication statusPublished - 20 Jul 2020
Event17th International Multi-Conference on Systems, Signals and Devices, SSD 2020 - Sfax, Tunisia
Duration: 20 Jul 202023 Jul 2020

Publication series

NameProceedings of the 17th International Multi-Conference on Systems, Signals and Devices, SSD 2020

Conference

Conference17th International Multi-Conference on Systems, Signals and Devices, SSD 2020
Country/TerritoryTunisia
CitySfax
Period20/07/2023/07/20

Keywords

  • Feature embedding
  • Historical Arabic documents
  • Word spotting

Fingerprint

Dive into the research topics of 'Evaluation of Feature-Embedding Methods for Word Spotting in Historical Arabic Documents'. Together they form a unique fingerprint.

Cite this