TY - GEN
T1 - Stroke width exploitation to improve automatic recognition of Arabic handwritten texts
AU - Chammas, Edgard
AU - Likforman-Sulem, Laurence
AU - Mokbel, Chafic
N1 - Publisher Copyright:
© 2017 IEEE
PY - 2017/10/13
Y1 - 2017/10/13
N2 - Several inherent factors increase the complexity of automatic recognition of handwritten documents, such as the size of writing and the stroke width. In a previous work [1], we showed that a successful exploitation of the writing size improves the recognition performance. In this work we are interested in considering the stroke width as a factor in modeling, to improve the performance of automatic systems. The experiments were conducted on Arabic handwritten documents from one of the largest labeled Arabic handwriting databases, NISTOpenHaRT. The database includes large variability in the stroke width. We propose several approaches to deal with these changes in both training and recognition phases. The first experiments show that the recognition is largely affected by the stroke width. To account for this parameter, we propose to classify data into three classes according to the stroke width. In the recognition phase, we have thickened each text-line image into several versions with predefined values, then we combined the recognition scores for each value. This approach has significant performance gains for both an HMM-based and a BLSTM-based recognition systems. In addition, we integrated synthetic data to adapt HMM models at different stroke width measures. We also obtained performance gains by two different combination methods (ROVER, trellis) on the adapted models results. We provide the obtained recognition results showing the benefits of exploiting the stroke width, and compare them with a known approach for stroke width normalization.
AB - Several inherent factors increase the complexity of automatic recognition of handwritten documents, such as the size of writing and the stroke width. In a previous work [1], we showed that a successful exploitation of the writing size improves the recognition performance. In this work we are interested in considering the stroke width as a factor in modeling, to improve the performance of automatic systems. The experiments were conducted on Arabic handwritten documents from one of the largest labeled Arabic handwriting databases, NISTOpenHaRT. The database includes large variability in the stroke width. We propose several approaches to deal with these changes in both training and recognition phases. The first experiments show that the recognition is largely affected by the stroke width. To account for this parameter, we propose to classify data into three classes according to the stroke width. In the recognition phase, we have thickened each text-line image into several versions with predefined values, then we combined the recognition scores for each value. This approach has significant performance gains for both an HMM-based and a BLSTM-based recognition systems. In addition, we integrated synthetic data to adapt HMM models at different stroke width measures. We also obtained performance gains by two different combination methods (ROVER, trellis) on the adapted models results. We provide the obtained recognition results showing the benefits of exploiting the stroke width, and compare them with a known approach for stroke width normalization.
KW - Adaptation
KW - Arabic handwriting recognition
KW - OpenHaRT database
KW - Stroke width
KW - Synthetic data
U2 - 10.1109/ASAR.2017.8067763
DO - 10.1109/ASAR.2017.8067763
M3 - Conference contribution
AN - SCOPUS:85070337276
T3 - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
SP - 74
EP - 78
BT - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Y2 - 3 April 2017 through 5 April 2017
ER -