TY - GEN
T1 - Bayesian methods for graph clustering
AU - Latouche, Pierre
AU - Birmelé, Etienne
AU - Ambroise, Christophe
PY - 2010/12/1
Y1 - 2010/12/1
N2 - Networks are used in many scientific fields such as biology, social science, and information technology. They aim at modelling, with edges, the way objects of interest, represented by vertices, are related to each other. Looking for clusters of vertices, also called communities or modules, has appeared to be a powerful approach for capturing the underlying structure of a network. In this context, the Block-Clustering model has been applied on random graphs. The principle of this method is to assume that given the latent structure of a graph, the edges are independent and generated from a parametric distribution. Many EM-like strategies have been proposed, in a frequentist setting, to optimize the parameters of themodel. Moreover, a criterion, based on an asymptotic approximation of the Integrated Classification Likelihood (ICL), has recently been derived to estimate the number of classes in the latent structure. In this paper, we show how the Block-Clustering model can be described in a full Bayesian framework and how the posterior distribution, of all the parameters and latent variables, can be approximated efficiently applying Variational Bayes (VB). We also propose a new non-asymptotic Bayesian model selection criterion. Using simulated data sets, we compare our approach to other strategies. We show that our criterion can outperform ICL.
AB - Networks are used in many scientific fields such as biology, social science, and information technology. They aim at modelling, with edges, the way objects of interest, represented by vertices, are related to each other. Looking for clusters of vertices, also called communities or modules, has appeared to be a powerful approach for capturing the underlying structure of a network. In this context, the Block-Clustering model has been applied on random graphs. The principle of this method is to assume that given the latent structure of a graph, the edges are independent and generated from a parametric distribution. Many EM-like strategies have been proposed, in a frequentist setting, to optimize the parameters of themodel. Moreover, a criterion, based on an asymptotic approximation of the Integrated Classification Likelihood (ICL), has recently been derived to estimate the number of classes in the latent structure. In this paper, we show how the Block-Clustering model can be described in a full Bayesian framework and how the posterior distribution, of all the parameters and latent variables, can be approximated efficiently applying Variational Bayes (VB). We also propose a new non-asymptotic Bayesian model selection criterion. Using simulated data sets, we compare our approach to other strategies. We show that our criterion can outperform ICL.
KW - Bayesian model selection
KW - Block-clustering model
KW - Integrated classification likelihood
KW - Random graphs
KW - Variational Bayes
KW - Variational EM
UR - https://www.scopus.com/pages/publications/84879560658
U2 - 10.1007/978-3-642-01044-6-21
DO - 10.1007/978-3-642-01044-6-21
M3 - Conference contribution
AN - SCOPUS:84879560658
SN - 9783642010439
T3 - Studies in Classification, Data Analysis, and Knowledge Organization
SP - 229
EP - 239
BT - Advances in Data Analysis, Data Handling and Business Intelligence - Proc. of the 32nd Annual Conference of the Gesellschaft fur Klassifikation e.V., GfKl 2008 - Joint Conference with BCS and VOC
T2 - 32nd Annual Conference of the German Classification Society (Gesellschaft fur Klassifikation) on Advances in Data Analysis, Data Handling and Business Intelligence, GfKl 2008
Y2 - 16 July 2008 through 18 July 2008
ER -