BLSTM-based handwritten text recognition using Web resources

Cristina Oprean, Laurence Likforman-Sulem, Chafic Mokbel, Adrian Popescu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Handwriting recognition systems usually rely on static dictionaries and language models. Full coverage of these dictionaries is generally not achieved when dealing with unrestricted document corpora due to the presence of Out-Of-Vocabulary words. In a previous work, dynamic dictionaries were built from Web resources and successfully applied to isolated word recognition. In the present work we extend this approach to text-line recognition. Line segmentation into words is needed to exploit dynamic dictionaries and it is performed using BLSTM classifiers to align filler models and word sequence outputs. Words are then classified based on the confidence score into anchor and non-anchor words (AWs and NAWs). AWs are equated to the BLSTM outputs and used as such. Dynamic dictionaries are built for NAWs by exploiting Web resources for their character sequence and for neighboring AWs. Text-lines are decoded again using dynamic dictionaries and re-estimated language model. We conduct experiments on the publicly available RIMES database and show that the introduction of the dynamic dictionary is beneficial. Equally important, we show that the gain increases as the proportion of OOVs increases.

Original languageEnglish
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages466-470
Number of pages5
ISBN (Electronic)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Externally publishedYes
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period23/08/1526/08/15

Fingerprint

Dive into the research topics of 'BLSTM-based handwritten text recognition using Web resources'. Together they form a unique fingerprint.

Cite this