TY - GEN
T1 - Integrating (Very) Heterogeneous Data Sources
T2 - 24th European Conference on Advances in Databases and Information Systems,ADBIS 2020
AU - Manolescu, Ioana
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Data integration is a broad area of data management research. It has lead to the development of many useful tools and concepts, each appropriate in a certain class of applicative settings. We consider the setting in which data sources have heterogeneous data models. This setting is of increasing relevance, as the (once predominant) relational databases are supplemented by data exchanged in formats such as JSON or XML, graphs such as Linked Open (RDF) data, or matrix (numerical) etc. We describe two lines of work in this setting. The first aims on improving performance in a polystore setting, where data sources are queried through a structure, composite query language; the focus here is on dramatically improving performance through the use of view-based rewriting techniques. The second data integration setting assumes that sources are much too heterogeneous for structured querying and thus, explore keyword-based search in an integrated graph built from all the available data. Designing and setting up data integration architectures remains a rather complex task; data heterogeneity makes it all the more challenging. We believe much remains to be done to consolidate and advance in this area in the future.
AB - Data integration is a broad area of data management research. It has lead to the development of many useful tools and concepts, each appropriate in a certain class of applicative settings. We consider the setting in which data sources have heterogeneous data models. This setting is of increasing relevance, as the (once predominant) relational databases are supplemented by data exchanged in formats such as JSON or XML, graphs such as Linked Open (RDF) data, or matrix (numerical) etc. We describe two lines of work in this setting. The first aims on improving performance in a polystore setting, where data sources are queried through a structure, composite query language; the focus here is on dramatically improving performance through the use of view-based rewriting techniques. The second data integration setting assumes that sources are much too heterogeneous for structured querying and thus, explore keyword-based search in an integrated graph built from all the available data. Designing and setting up data integration architectures remains a rather complex task; data heterogeneity makes it all the more challenging. We believe much remains to be done to consolidate and advance in this area in the future.
U2 - 10.1007/978-3-030-54832-2_3
DO - 10.1007/978-3-030-54832-2_3
M3 - Conference contribution
AN - SCOPUS:85090096905
SN - 9783030548315
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 15
EP - 20
BT - Advances in Databases and Information Systems - 24th European Conference, ADBIS 2020, Proceedings
A2 - Darmont, Jérôme
A2 - Novikov, Boris
A2 - Wrembel, Robert
PB - Springer
Y2 - 25 August 2020 through 27 August 2020
ER -