Arabic handwritten document preprocessing and recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.

Original languageEnglish
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages451-455
Number of pages5
ISBN (Electronic)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Externally publishedYes
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period23/08/1526/08/15

Keywords

  • Arabic Handwriting Recognition
  • Guideline removal
  • Handwritten Document preprocessing
  • Noise removal
  • OpenHaRT database
  • Textline image Preprocessing

Fingerprint

Dive into the research topics of 'Arabic handwritten document preprocessing and recognition'. Together they form a unique fingerprint.

Cite this