TY - GEN
T1 - Arabic handwritten document preprocessing and recognition
AU - Chammas, Edgard
AU - Mokbel, Chafic
AU - Likforman-Sulem, Laurence
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/20
Y1 - 2015/11/20
N2 - Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.
AB - Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.
KW - Arabic Handwriting Recognition
KW - Guideline removal
KW - Handwritten Document preprocessing
KW - Noise removal
KW - OpenHaRT database
KW - Textline image Preprocessing
UR - https://www.scopus.com/pages/publications/84962533781
U2 - 10.1109/ICDAR.2015.7333802
DO - 10.1109/ICDAR.2015.7333802
M3 - Conference contribution
AN - SCOPUS:84962533781
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 451
EP - 455
BT - 13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PB - IEEE Computer Society
T2 - 13th International Conference on Document Analysis and Recognition, ICDAR 2015
Y2 - 23 August 2015 through 26 August 2015
ER -