Finding the PG schema of any (semi)structured dataset: a tale of graphs and abstraction

Nelly Barret, Tudor Enache, Ioana Manolescu, Madhulika Mohanty

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Property Graphs (PGs) are an attractive data model both for business users, and for developers of data management tools. They combine the internal structure helpful in relational databases, where each record has a clearly identified set of attributes, with the flexible structure and support for heterogeneity, common in graph databases. Several useful and/or interesting datasets are available in non-PG data models. These include legacy databases, created before the advent of the PG standards, as well as well-known benchmarks based on real and synthetic data, Open Data published in other formats such as XML, JSON or RDF, etc. Converting such datasets to Property Graphs would enable their exploitation under the PG model. In this work-in-progress paper, we describe an approach to derive, from any (semi)-structured dataset, a PG schema consisting of node types, edge types, and a graph type. Our approach builds on (i) ConnectionLens, a tool for converting (semi)structured datasets into simple data graphs, and (ii) Abstra, which, in a ConnectionLens graph, identifies a set of entities and relationships. This work is the first step towards a universal data migration tool from (semi)-structured data, to PGs.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering Workshops, ICDEW 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages365-369
Number of pages5
ISBN (Electronic)9798350317152
DOIs
Publication statusPublished - 1 Jan 2024
Event40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024 - Utrecht, Netherlands
Duration: 13 May 202416 May 2024

Publication series

NameProceedings - 2024 IEEE 40th International Conference on Data Engineering Workshops, ICDEW 2024

Conference

Conference40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2416/05/24

Keywords

  • Property graphs
  • heterogeneous data
  • schema discovery
  • schema mapping

Fingerprint

Dive into the research topics of 'Finding the PG schema of any (semi)structured dataset: a tale of graphs and abstraction'. Together they form a unique fingerprint.

Cite this