Handwriting recognition of historical documents with few labeled data

Edgard Chammas, Chafic Mokbel, Laurence Likforman-Sulem

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated text lines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multi scale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition [1].

Original languageEnglish
Title of host publicationProceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages43-48
Number of pages6
ISBN (Electronic)9781538633465
DOIs
Publication statusPublished - 22 Jun 2018
Externally publishedYes
Event13th IAPR International Workshop on Document Analysis Systems, DAS 2018 - Vienna, Austria
Duration: 24 Apr 201827 Apr 2018

Publication series

NameProceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018

Conference

Conference13th IAPR International Workshop on Document Analysis Systems, DAS 2018
Country/TerritoryAustria
CityVienna
Period24/04/1827/04/18

Keywords

  • CRNN
  • handwriting recognition
  • historical documents
  • limited labeled data
  • model-based normalization scheme
  • multi-scale training
  • variability

Fingerprint

Dive into the research topics of 'Handwriting recognition of historical documents with few labeled data'. Together they form a unique fingerprint.

Cite this