Passer à la navigation principale Passer à la recherche Passer au contenu principal

Skyline Operators for Document Spanners

  • Technion - Israel Institute of Technology
  • PSL research University & IPSL
  • Université d'Artois

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

langue originaleAnglais
titre27th International Conference on Database Theory, ICDT 2024
rédacteurs en chefGraham Cormode, Michael Shekelyan
EditeurSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronique)9783959773126
Les DOIs
étatPublié - 1 mars 2024
Evénement27th International Conference on Database Theory, ICDT 2024 - Paestum, Italie
Durée: 25 mars 202428 mars 2024

Série de publications

NomLeibniz International Proceedings in Informatics, LIPIcs
Volume290
ISSN (imprimé)1868-8969

Une conférence

Une conférence27th International Conference on Database Theory, ICDT 2024
Pays/TerritoireItalie
La villePaestum
période25/03/2428/03/24

Empreinte digitale

Examiner les sujets de recherche de « Skyline Operators for Document Spanners ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation