TY - GEN
T1 - Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens
AU - Anadiotis, Angelos Christos
AU - Balalau, Oana
AU - Bouganim, Théo
AU - Chimienti, Francesco
AU - Galhardas, Helena
AU - Haddad, Mhd Yamen
AU - Horel, Stéphane
AU - Manolescu, Ioana
AU - Youssef, Youssr
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/30
Y1 - 2021/10/30
N2 - Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.
AB - Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.
KW - data integration
KW - graph databases
KW - investigative journalism
KW - keyword search
U2 - 10.1145/3459637.3481982
DO - 10.1145/3459637.3481982
M3 - Conference contribution
AN - SCOPUS:85119173670
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 4670
EP - 4674
BT - CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Y2 - 1 November 2021 through 5 November 2021
ER -