Skip to main navigation Skip to search Skip to main content

Skyline Operators for Document Spanners

  • Technion - Israel Institute of Technology
  • PSL research University & IPSL
  • Université d'Artois

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

Original languageEnglish
Title of host publication27th International Conference on Database Theory, ICDT 2024
EditorsGraham Cormode, Michael Shekelyan
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959773126
DOIs
Publication statusPublished - 1 Mar 2024
Event27th International Conference on Database Theory, ICDT 2024 - Paestum, Italy
Duration: 25 Mar 202428 Mar 2024

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume290
ISSN (Print)1868-8969

Conference

Conference27th International Conference on Database Theory, ICDT 2024
Country/TerritoryItaly
CityPaestum
Period25/03/2428/03/24

Keywords

  • Document Spanners
  • Information Extraction
  • Query Evaluation

Fingerprint

Dive into the research topics of 'Skyline Operators for Document Spanners'. Together they form a unique fingerprint.

Cite this