Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens

  • Angelos Christos Anadiotis
  • , Oana Balalau
  • , Théo Bouganim
  • , Francesco Chimienti
  • , Helena Galhardas
  • , Mhd Yamen Haddad
  • , Stéphane Horel
  • , Ioana Manolescu
  • , Youssr Youssef

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages4670-4674
Number of pages5
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 30 Oct 2021
Externally publishedYes
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 1 Nov 20215 Nov 2021

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/11/215/11/21

Keywords

  • data integration
  • graph databases
  • investigative journalism
  • keyword search

Fingerprint

Dive into the research topics of 'Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens'. Together they form a unique fingerprint.

Cite this