Abstract
World-Wide-Web, with website and webpage as a main interface, facilitates dissemination of important information. Hence it is crucial to optimize webpage design for better user interaction, which is primarily done by analyzing users’ behavior, especially users’ eye-gaze locations on the webpage. However, gathering these data is still considered to be labor and time intensive. In this work, we enable the development of automatic eye-gaze estimations given webpage screenshots as input by curating of a unified dataset that consists of webpage screenshots, eye-gaze heatmap and website’s layout information in the form of image and text masks. Our curated dataset allows us to propose a deep learning-based model that leverages on both webpage screenshot and content information (image and text spatial location), which are then combined through attention mechanism for effective eye-gaze prediction. In our experiment, we show benefits of careful fine-tuning using our unified dataset to improve accuracy of eye-gaze predictions. We further observe the capability of our model to focus on targeted areas (images and text) to achieve accurate eye-gaze area predictions. Finally, comparison with other alternatives shows state-of-the-art result of our approach, establishing a benchmark for webpage based eye-gaze prediction task.
| Original language | English |
|---|---|
| Pages (from-to) | 121-132 |
| Number of pages | 12 |
| Journal | Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
| Volume | 4 |
| DOIs | |
| Publication status | Published - 1 Jan 2023 |
| Externally published | Yes |
| Event | 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2023 - Lisbon, Portugal Duration: 19 Feb 2023 → 21 Feb 2023 |
Keywords
- Eye-Gaze Saliency
- Image Translation
- Visual Attention