TY - JOUR
T1 - A Bias Injection Technique to Assess the Resilience of Causal Discovery Methods
AU - Cinquini, Martina
AU - Makhlouf, Karima
AU - Zhioua, Sami
AU - Palamidessi, Catuscia
AU - Guidotti, Riccardo
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Causal discovery (CD) algorithms are increasingly applied to socially and ethically sensitive domains. However, their evaluation under realistic conditions remains challenging due to the scarcity of real-world datasets annotated with ground-truth causal structures. Whereas synthetic data generators support controlled benchmarking, they often overlook forms of bias, such as dependencies involving sensitive attributes, which may significantly affect the observed distribution and compromise the trustworthiness of downstream analysis. This paper introduces a novel synthetic data generation framework that enables controlled bias injection while preserving the causal relationships specified in a ground-truth causal graph. The framework aims to evaluate the reliability of CD methods by examining the impact of varying bias levels and outcome binarization thresholds. Experimental results show that even moderate bias levels can lead CD approaches to fail to correctly infer causal links, particularly those connecting sensitive attributes to decision outcomes. These findings underscore the need for expert validation and highlight the limitations of current CD methods in fairness-critical applications. Our proposal thus provides an essential tool for benchmarking and improving CD algorithms in biased, real-world data settings.
AB - Causal discovery (CD) algorithms are increasingly applied to socially and ethically sensitive domains. However, their evaluation under realistic conditions remains challenging due to the scarcity of real-world datasets annotated with ground-truth causal structures. Whereas synthetic data generators support controlled benchmarking, they often overlook forms of bias, such as dependencies involving sensitive attributes, which may significantly affect the observed distribution and compromise the trustworthiness of downstream analysis. This paper introduces a novel synthetic data generation framework that enables controlled bias injection while preserving the causal relationships specified in a ground-truth causal graph. The framework aims to evaluate the reliability of CD methods by examining the impact of varying bias levels and outcome binarization thresholds. Experimental results show that even moderate bias levels can lead CD approaches to fail to correctly infer causal links, particularly those connecting sensitive attributes to decision outcomes. These findings underscore the need for expert validation and highlight the limitations of current CD methods in fairness-critical applications. Our proposal thus provides an essential tool for benchmarking and improving CD algorithms in biased, real-world data settings.
KW - Fairness
KW - bias
KW - causal discovery
KW - machine learning
KW - synthetic data generation
UR - https://www.scopus.com/pages/publications/105005965624
U2 - 10.1109/ACCESS.2025.3573201
DO - 10.1109/ACCESS.2025.3573201
M3 - Article
AN - SCOPUS:105005965624
SN - 2169-3536
VL - 13
SP - 97376
EP - 97391
JO - IEEE Access
JF - IEEE Access
ER -