Web document analysis based on visual segmentation and page rendering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes an approach for segmenting a Web page into its semantic parts. Such analysis may be useful for adapting blog or other pages on small devices. In this approach, we take advantage of both dynamic layout after rendering and textual information. Our method segments the page into blocks and then classifies the blocks. A classification in semantic parts is performed thanks to a SVM-based machine learning approach using a set of 30 textual and visual-based features. Evaluation is conducted on a Web blog database. Results are provided for both block classification and blog segmentation into articles.

Original languageEnglish
Title of host publicationProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Pages354-358
Number of pages5
DOIs
Publication statusPublished - 24 May 2012
Event10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, Australia
Duration: 27 Mar 201229 Mar 2012

Publication series

NameProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012

Conference

Conference10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Country/TerritoryAustralia
CityGold Coast, QLD
Period27/03/1229/03/12

Keywords

  • Internet document
  • Web page segmentation
  • block segmentation
  • semantic block

Fingerprint

Dive into the research topics of 'Web document analysis based on visual segmentation and page rendering'. Together they form a unique fingerprint.

Cite this