TY - GEN
T1 - Document Segmentation for WebAR application
AU - Lelong, Thibault
AU - Preda, Marius
AU - Zaharia, Titus
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/11/2
Y1 - 2022/11/2
N2 - In recent years, we have witnessed the appearance of consumer applications of Augmented Reality (AR) available natively on smartphones. More recently, these applications are also implemented in web browsers. Among various AR applications, a simple one consisting in detecting a target object filmed by the phone and trigger an event following the detection. The target object can be of any kind, including 3D objects or simpler documents and printed pictures. The underlying process consists in comparing the image captured by the camera with large scale image remote database. The goal is then to display new content over the target object, by keeping the 3D spatial registration. When the target object is a document (or printed picture), the image captured by the camera contains, in many cases, a lot of useless information (such as the background). It is therefore more optimal to segment the captured image and send only to the server the representation of the target object. In this paper, we propose a deep-learning (DL) based method for fast detection and segmentation of printed documents within natural images. The goal is to provide a light and fast DL model to be used directly in the web browser, on mobile devices. We designed a compact and fast DL architecture, allowing to keep the same accuracy as the reference architecture, but dividing the inference time by 3 and the number of parameters by 10.
AB - In recent years, we have witnessed the appearance of consumer applications of Augmented Reality (AR) available natively on smartphones. More recently, these applications are also implemented in web browsers. Among various AR applications, a simple one consisting in detecting a target object filmed by the phone and trigger an event following the detection. The target object can be of any kind, including 3D objects or simpler documents and printed pictures. The underlying process consists in comparing the image captured by the camera with large scale image remote database. The goal is then to display new content over the target object, by keeping the 3D spatial registration. When the target object is a document (or printed picture), the image captured by the camera contains, in many cases, a lot of useless information (such as the background). It is therefore more optimal to segment the captured image and send only to the server the representation of the target object. In this paper, we propose a deep-learning (DL) based method for fast detection and segmentation of printed documents within natural images. The goal is to provide a light and fast DL model to be used directly in the web browser, on mobile devices. We designed a compact and fast DL architecture, allowing to keep the same accuracy as the reference architecture, but dividing the inference time by 3 and the number of parameters by 10.
KW - Document segmentation
KW - Image segmentation
KW - fully convolutional network
KW - web application
U2 - 10.1145/3564533.3564570
DO - 10.1145/3564533.3564570
M3 - Conference contribution
AN - SCOPUS:85142632667
T3 - Proceedings - Web3D 2022: 27th ACM Conference on 3D Web Technology
BT - Proceedings - Web3D 2022
A2 - Spencer, Stephen N.
PB - Association for Computing Machinery, Inc
T2 - 27th ACM Conference on 3D Web Technology, Web3D 2022
Y2 - 2 November 2022 through 4 November 2022
ER -