Text Line segmentation of historical Arabic documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a text line segmentation method for printed or handwritten historical Arabic documents. Documents are first classified into 2 classes using a K-means scheme. These classes correspond to document complexity (easy or not easy to segment). Then, a document which includes overlapping and touching characters, is divided into vertical strips. The extracted text blocks obtained by horizontal projection are classified into three categories: small, average and large text blocks. After segmenting the large text blocks, the lines are obtained by matching adjacent blocks within two successive strips using spatial relationship. The document without overlapping or touching characters is segmented by making abstraction on the segmentation module of the large text blocks. The text line segmentation method has a 96% accuracy on a collection of 100 historical documents.

Original languageEnglish
Title of host publicationProceedings - 9th International Conference on Document Analysis and Recognition, ICDAR 2007
Pages138-142
Number of pages5
DOIs
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event9th International Conference on Document Analysis and Recognition, ICDAR 2007 - Curitiba, Brazil
Duration: 23 Sept 200726 Sept 2007

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume1
ISSN (Print)1520-5363

Conference

Conference9th International Conference on Document Analysis and Recognition, ICDAR 2007
Country/TerritoryBrazil
CityCuritiba
Period23/09/0726/09/07

Fingerprint

Dive into the research topics of 'Text Line segmentation of historical Arabic documents'. Together they form a unique fingerprint.

Cite this