ObjectRunner: Lightweight, targeted extraction and querying of structured web data

Talel Abdessalem, Bogdan Cautis, Nora Derouiche

Research output: Contribution to journalArticlepeer-review

Abstract

We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is first provided, in a flexible and widely applicable manner. ObjectRunner follows then a lightweight, best-effort approach, leveraging both the input description and the source structure. This process is domain-independent, in the sense that it applies to any relation, either flat or nested, describing real-world items. We advocate via our prototype that fully automatic extraction and integration of structured data can be done fast and effectively, when the redundancy of the Web meets knowledge over the to-be-extracted data. We present the technical details and the overall platform through several application scenarios on real-life Web sources.

Original languageEnglish
Pages (from-to)1585-1588
Number of pages4
JournalProceedings of the VLDB Endowment
Volume3
Issue number2
DOIs
Publication statusPublished - 1 Jan 2010
Externally publishedYes

Fingerprint

Dive into the research topics of 'ObjectRunner: Lightweight, targeted extraction and querying of structured web data'. Together they form a unique fingerprint.

Cite this