TY - GEN
T1 - Get a sample for a discount sampling-based XML data pricing
AU - Tang, Ruiming
AU - Amarilli, Antoine
AU - Senellart, Pierre
AU - Bressan, Stéphane
PY - 2014/1/1
Y1 - 2014/1/1
N2 - While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.
AB - While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.
U2 - 10.1007/978-3-319-10073-9_3
DO - 10.1007/978-3-319-10073-9_3
M3 - Conference contribution
AN - SCOPUS:84958549675
SN - 9783319100722
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 20
EP - 34
BT - Database and Expert Systems Applications - 25th International Conference, DEXA 2014, Proceedings
PB - Springer Verlag
T2 - 25th International Conference on Database and Expert Systems Applications, DEXA 2014
Y2 - 1 September 2014 through 4 September 2014
ER -