Résumé
Several inherent factors increase the complexity of automatic recognition of handwritten documents, such as the size of writing. In this work we are interested in considering such factors in modeling, to improve the performance of automatic systems. The experiments were conducted on Arabic handwritten documents from one of the largest labeled Arabic handwriting databases, NIST-OpenHaRT. The database includes large inter- and intra- variability in the text size. We propose several approaches to deal with these changes in both training and recognition phases. The first experiments show that the recognition is largely affected by the writing size. To account for this parameter, we propose to classify data into three classes according to the writing size. In the recognition phase, we have resized each text-line image into several versions of predefined sizes, then we combined the recognition scores for each size. This approach has significant performance gains for both a HMM-based and a BLSTM-based recognition systems. In addition, we integrated synthetic data to adapt HMM models at different scales. We also obtained performance gains by two different combination methods (ROVER, trellis) on the adapted models results. We provide the obtained recognition results showing the benefits of exploiting the writing size.
| langue originale | Français |
|---|---|
| Pages (de - à) | 95-115 |
| Nombre de pages | 21 |
| journal | Document Numerique |
| Volume | 19 |
| Numéro de publication | 2-3 |
| Les DOIs | |
| état | Publié - 21 déc. 2016 |
| Modification externe | Oui |
mots-clés
- Adaptation
- Arabic handwriting recognition
- OpenHaRT database
- Synthetic data
- Writing scale
Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver