Abstract
Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs.
| Original language | English |
|---|---|
| Pages (from-to) | 469-480 |
| Number of pages | 12 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 6 |
| Issue number | 6 |
| DOIs | |
| Publication status | Published - 1 Jan 2013 |
| Externally published | Yes |
| Event | 39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy Duration: 26 Aug 2013 → 30 Aug 2013 |