A framework for sampling-based XML data pricing

Ruiming Tang, Antoine Amarilli, Pierre Senellart, Stéphane Bressan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We present a pseudo-polynomial time algorithm to select a rooted subtree with prescribed weight uniformly at random, but show that this problem is unfortunately intractable. Yet, we are able to identify several practical cases where our algorithm runs in polynomial time. The first case is uniform random sampling of a rooted subtree with prescribed size rather than weights; the second case restricts to binary weights. As a more challenging scenario for the sampling problem, we also study the uniform sampling of a rooted subtree of prescribed weight and prescribed height. We adapt our pseudo-polynomial time algorithm to this setting and identify tractable cases.

Original languageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems XXIV
EditorsSebastian Link, Lenka Lhotska, Hendrik Decker, Abdelkader Hameurlain, Josef Küng, Roland Wagner
PublisherSpringer Verlag
Pages116-138
Number of pages23
ISBN (Print)9783662492130
DOIs
Publication statusPublished - 1 Jan 2016
Externally publishedYes
Event25th International Conference on Database and Expert Systems Applications, DEXA 2014 - Munich, Germany
Duration: 1 Sept 20144 Sept 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9510
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Database and Expert Systems Applications, DEXA 2014
Country/TerritoryGermany
CityMunich
Period1/09/144/09/14

Fingerprint

Dive into the research topics of 'A framework for sampling-based XML data pricing'. Together they form a unique fingerprint.

Cite this