TY - GEN
T1 - Streaming saturation for large RDF graphs with dynamic schema information
AU - Farvardin, Mohammad Amin
AU - Colazzo, Dario
AU - Belhajjame, Khalid
AU - Sartiani, Carlo
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019/6/23
Y1 - 2019/6/23
N2 - In the Big Data era, RDF data are produced in high volumes. While there exist proposals for reasoning over large RDF graphs using big data platforms, there is a dearth of solutions that do so in environments where RDF data are dynamic, and where new instance and schema triples can arrive at any time. In this work, we present the first solution for reasoning over large streams of RDF data using big data platforms. In doing so, we focus on the saturation operation, which seeks to infer implicit RDF triples given RDF schema constraints. Indeed, unlike existing solutions which saturate RDF data in bulk, our solution carefully identifies the fragment of the existing (and already saturated) RDF dataset that needs to be considered given the fresh RDF statements delivered by the stream. Thereby, it performs the saturation in an incremental manner. Experimental analysis shows that our solution outperforms existing bulk-based saturation solutions.
AB - In the Big Data era, RDF data are produced in high volumes. While there exist proposals for reasoning over large RDF graphs using big data platforms, there is a dearth of solutions that do so in environments where RDF data are dynamic, and where new instance and schema triples can arrive at any time. In this work, we present the first solution for reasoning over large streams of RDF data using big data platforms. In doing so, we focus on the saturation operation, which seeks to infer implicit RDF triples given RDF schema constraints. Indeed, unlike existing solutions which saturate RDF data in bulk, our solution carefully identifies the fragment of the existing (and already saturated) RDF dataset that needs to be considered given the fresh RDF statements delivered by the stream. Thereby, it performs the saturation in an incremental manner. Experimental analysis shows that our solution outperforms existing bulk-based saturation solutions.
KW - Big Data
KW - RDF saturation
KW - RDF streams
KW - Spark
UR - https://www.scopus.com/pages/publications/85071176833
U2 - 10.1145/3315507.3330201
DO - 10.1145/3315507.3330201
M3 - Conference contribution
AN - SCOPUS:85071176833
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 42
EP - 52
BT - DBPL 2019 - Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, co-located with PLDI 2019
A2 - Cheung, Alvin
A2 - Nguyen, Kim
PB - Association for Computing Machinery
T2 - 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, co-located with PLDI 2019
Y2 - 23 June 2019
ER -