TY - JOUR
T1 - Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors
AU - Lartigue, Thomas
AU - Durrleman, Stanley
AU - Allassonnière, Stéphanie
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population’s graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features’ average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features’ effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.
AB - Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population’s graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features’ average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features’ effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.
KW - Conditional Gaussian Graphical Models
KW - EM algorithm
KW - Mixture models
U2 - 10.1007/s42979-021-00865-5
DO - 10.1007/s42979-021-00865-5
M3 - Article
AN - SCOPUS:85131792478
SN - 2662-995X
VL - 2
JO - SN Computer Science
JF - SN Computer Science
IS - 6
M1 - 466
ER -