TY - GEN
T1 - Finding the PG schema of any (semi)structured dataset
T2 - 40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024
AU - Barret, Nelly
AU - Enache, Tudor
AU - Manolescu, Ioana
AU - Mohanty, Madhulika
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Property Graphs (PGs) are an attractive data model both for business users, and for developers of data management tools. They combine the internal structure helpful in relational databases, where each record has a clearly identified set of attributes, with the flexible structure and support for heterogeneity, common in graph databases. Several useful and/or interesting datasets are available in non-PG data models. These include legacy databases, created before the advent of the PG standards, as well as well-known benchmarks based on real and synthetic data, Open Data published in other formats such as XML, JSON or RDF, etc. Converting such datasets to Property Graphs would enable their exploitation under the PG model. In this work-in-progress paper, we describe an approach to derive, from any (semi)-structured dataset, a PG schema consisting of node types, edge types, and a graph type. Our approach builds on (i) ConnectionLens, a tool for converting (semi)structured datasets into simple data graphs, and (ii) Abstra, which, in a ConnectionLens graph, identifies a set of entities and relationships. This work is the first step towards a universal data migration tool from (semi)-structured data, to PGs.
AB - Property Graphs (PGs) are an attractive data model both for business users, and for developers of data management tools. They combine the internal structure helpful in relational databases, where each record has a clearly identified set of attributes, with the flexible structure and support for heterogeneity, common in graph databases. Several useful and/or interesting datasets are available in non-PG data models. These include legacy databases, created before the advent of the PG standards, as well as well-known benchmarks based on real and synthetic data, Open Data published in other formats such as XML, JSON or RDF, etc. Converting such datasets to Property Graphs would enable their exploitation under the PG model. In this work-in-progress paper, we describe an approach to derive, from any (semi)-structured dataset, a PG schema consisting of node types, edge types, and a graph type. Our approach builds on (i) ConnectionLens, a tool for converting (semi)structured datasets into simple data graphs, and (ii) Abstra, which, in a ConnectionLens graph, identifies a set of entities and relationships. This work is the first step towards a universal data migration tool from (semi)-structured data, to PGs.
KW - Property graphs
KW - heterogeneous data
KW - schema discovery
KW - schema mapping
U2 - 10.1109/ICDEW61823.2024.00055
DO - 10.1109/ICDEW61823.2024.00055
M3 - Conference contribution
AN - SCOPUS:85197341502
T3 - Proceedings - 2024 IEEE 40th International Conference on Data Engineering Workshops, ICDEW 2024
SP - 365
EP - 369
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering Workshops, ICDEW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 May 2024 through 16 May 2024
ER -