TY - GEN
T1 - Text Line segmentation of historical Arabic documents
AU - Zahour, Abderrazak
AU - Likforman-Sulem, Laurence
AU - Boussalaa, Wafa
AU - Taconet, Bruno
PY - 2007/12/1
Y1 - 2007/12/1
N2 - This paper presents a text line segmentation method for printed or handwritten historical Arabic documents. Documents are first classified into 2 classes using a K-means scheme. These classes correspond to document complexity (easy or not easy to segment). Then, a document which includes overlapping and touching characters, is divided into vertical strips. The extracted text blocks obtained by horizontal projection are classified into three categories: small, average and large text blocks. After segmenting the large text blocks, the lines are obtained by matching adjacent blocks within two successive strips using spatial relationship. The document without overlapping or touching characters is segmented by making abstraction on the segmentation module of the large text blocks. The text line segmentation method has a 96% accuracy on a collection of 100 historical documents.
AB - This paper presents a text line segmentation method for printed or handwritten historical Arabic documents. Documents are first classified into 2 classes using a K-means scheme. These classes correspond to document complexity (easy or not easy to segment). Then, a document which includes overlapping and touching characters, is divided into vertical strips. The extracted text blocks obtained by horizontal projection are classified into three categories: small, average and large text blocks. After segmenting the large text blocks, the lines are obtained by matching adjacent blocks within two successive strips using spatial relationship. The document without overlapping or touching characters is segmented by making abstraction on the segmentation module of the large text blocks. The text line segmentation method has a 96% accuracy on a collection of 100 historical documents.
UR - https://www.scopus.com/pages/publications/51149106989
U2 - 10.1109/ICDAR.2007.4378691
DO - 10.1109/ICDAR.2007.4378691
M3 - Conference contribution
AN - SCOPUS:51149106989
SN - 0769528228
SN - 9780769528229
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 138
EP - 142
BT - Proceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
T2 - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Y2 - 23 September 2007 through 26 September 2007
ER -