Reconnaissance de mots manuscrits horsvocabulaire en utilisant des ressources web

Cristina Oprean, Chafic Mokbel, Laurence Likforman-Sulem, Adrian Popescu

Research output: Contribution to journalArticlepeer-review

Abstract

Handwriting recognition systems rely on predefined dictionaries. Small and static dictionaries are often exploited to obtain high in-vocabulary (IV) accuracy at the expense of coverage. Thus the recognition of out-of-vocabulary (OOV) words is not handled efficiently. To improve OOV recognition while keeping IV dictionaries small, we introduce a multi-step approach that exploits web resources. After an IV-OOV classification, Wikipedia is used to create OOV sequence-adapted dynamic dictionaries. A second decoding is done the dynamic dictionary to determine the most probable word for the OOV sequence. We validate our approach with experiments conducted on the RIMES dataset using a BLSTM recognizer. Results show that improvements are obtained compared to handwriting recognition with static dictionary.

Original languageFrench
Pages (from-to)77-96
Number of pages20
JournalDocument Numerique
Volume17
Issue number3
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes

Cite this