Graph integration of structured, semistructured and unstructured data for data journalism

  • Angelos Christos Anadiotis
  • , Oana Balalau
  • , Catarina Conceição
  • , Helena Galhardas
  • , Mhd Yamen Haddad
  • , Ioana Manolescu
  • , Tayeb Merabti
  • , Jingmao You

Research output: Contribution to journalArticlepeer-review

Abstract

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.

Original languageEnglish
Article number101846
JournalInformation Systems
Volume104
DOIs
Publication statusPublished - 1 Feb 2022

Keywords

  • Data journalism
  • Heterogeneous data integration
  • Information extraction

Fingerprint

Dive into the research topics of 'Graph integration of structured, semistructured and unstructured data for data journalism'. Together they form a unique fingerprint.

Cite this