Human-in-the-loop schema inference for massive JSON datasets

Mohamed Amine Baazizi, Clément Berti, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

JSON established itself as a popular data format for representing data whose structure is irregular or unknown a priori. JSON collections are usually massive and schema-less. Inferring a schema describing the structure of these collections is crucial for formulating meaningful queries and for adopting schema-based optimizations. In a recent work, we proposed a Map/Reduce schema inference approach that either infers a compact representation of the input collection or a precise description of every possible shape in the data. Since no level of precision is ideal, it is more appealing to give the analyst the freedom of choosing between different levels of precisions in an interactive fashion. In this paper we describe a schema inference system offering this important functionality.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2020
Subtitle of host publication23rd International Conference on Extending Database Technology, Proceedings
EditorsAngela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Bohm, Dan Olteanu, George Fletcher, Arijit Khan, Bin Yang
PublisherOpenProceedings.org
Pages635-638
Number of pages4
ISBN (Electronic)9783893180837
DOIs
Publication statusPublished - 1 Jan 2020
Externally publishedYes
Event23rd International Conference on Extending Database Technology, EDBT 2020 - Copenhagen, Denmark
Duration: 30 Mar 20202 Apr 2020

Publication series

NameAdvances in Database Technology - EDBT
Volume2020-March
ISSN (Electronic)2367-2005

Conference

Conference23rd International Conference on Extending Database Technology, EDBT 2020
Country/TerritoryDenmark
CityCopenhagen
Period30/03/202/04/20

Fingerprint

Dive into the research topics of 'Human-in-the-loop schema inference for massive JSON datasets'. Together they form a unique fingerprint.

Cite this