Overlapping and multi-touching text-line segmentation by block covering analysis

Abderrazak Zahour, Brunco Taconet, Laurence Likforman-Sulem, Wafa Boussellaa

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a new approach for text-line segmentation based on Block Covering which solves the problem of overlapping and multi-touching components. Block Covering is the core of a system which processes a set of ancient Arabic documents from historical archives. The system is designed for separating text-lines even if they are overlapping and multi-touching. We exploit the Block Covering technique in three steps: a new fractal analysis (Block Counting) for document classification, a statistical analysis of block heights for block classification and a neighboring analysis for building text-lines. The Block Counting fractal analysis, associated with a fuzzy C-means scheme, is performed on document images in order to classify them according to their complexity: tightly (closely) spaced documents (TSD) or widely spaced documents (WSD). An optimal Block Covering is applied on TSD documents which include overlapping and multi-touching lines. The large blocks generated by the covering are then segmented by relying on the statistical analysis of block heights. The final labeling into text-lines is based on a block neighboring analysis. Experimental results provided on images of the Tunisian Historical Archives reveal the feasibility of the Block Covering technique for segmenting ancient Arabic documents.

Original languageEnglish
Pages (from-to)335-351
Number of pages17
JournalPattern Analysis and Applications
Volume12
Issue number4
DOIs
Publication statusPublished - 1 Oct 2009
Externally publishedYes

Keywords

  • Ancient Arabic documents
  • Block Counting
  • Block covering
  • Overlapping and multi-touching lines
  • Text-line segmentation

Fingerprint

Dive into the research topics of 'Overlapping and multi-touching text-line segmentation by block covering analysis'. Together they form a unique fingerprint.

Cite this