TY - GEN
T1 - Scalability issues in designing and implementing semantic provenance management systems
AU - Sakka, Mohamed Amin
AU - Defude, Bruno
PY - 2012/9/24
Y1 - 2012/9/24
N2 - Provenance is a key metadata for assessing electronic documents trustworthiness. Most of the applications exchanging and processing documents on the web or in the cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data. A new type of system emerges, called provenance management system (or PMS). These systems offer a unified way to model, collect and query provenance data from various applications. This work presents such a system based on semantic web technologies and focuses on scalability issues. In fact, modern infrastructure such as cloud can produce huge volume of provenance data and scalability becomes a major issue. We describe here an implementation of our PMS based on an NoSQL DBMS coupled with the map-reduce parallel model and present different experimentations illustrating how it scales linearly depending on the size of the processed logs.
AB - Provenance is a key metadata for assessing electronic documents trustworthiness. Most of the applications exchanging and processing documents on the web or in the cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data. A new type of system emerges, called provenance management system (or PMS). These systems offer a unified way to model, collect and query provenance data from various applications. This work presents such a system based on semantic web technologies and focuses on scalability issues. In fact, modern infrastructure such as cloud can produce huge volume of provenance data and scalability becomes a major issue. We describe here an implementation of our PMS based on an NoSQL DBMS coupled with the map-reduce parallel model and present different experimentations illustrating how it scales linearly depending on the size of the processed logs.
U2 - 10.1007/978-3-642-32344-7_5
DO - 10.1007/978-3-642-32344-7_5
M3 - Conference contribution
AN - SCOPUS:84866375603
SN - 9783642323430
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 49
EP - 61
BT - Data Management in Cloud, Grid and P2P Systems - 5th International Conference, Globe 2012, Proceedings
T2 - 5th International Conference on Data Management in Cloud, Grid, and P2P Systems, Globe 2012
Y2 - 5 September 2012 through 6 September 2012
ER -