TY - JOUR
T1 - The deep latent position topic model for clustering and representation of networks with textual edges
AU - Boutin, Rémi
AU - Latouche, Pierre
AU - Bouveyron, Charles
N1 - Publisher Copyright:
© 2025 The Author(s). Scandinavian Journal of Statistics published by John Wiley & Sons Ltd on behalf of The Board of the Foundation of the Scandinavian Journal of Statistics.
PY - 2025/12/1
Y1 - 2025/12/1
N2 - Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualization of the data is mandatory. To address both issues, we introduce Deep-LPTM, a model-based clustering strategy relying on a variational graph auto-encoder approach and a probabilistic model to characterize the discussion topics. Deep-LPTM allows to build a joint representation of the nodes and the edges in two embedding spaces. The parameters are inferred using a variational inference algorithm. We also introduce IC2L, a model selection criterion specifically designed to choose models with relevant clustering and visualization properties. An extensive benchmark study on synthetic data is provided. In particular, we find that Deep-LPTM better recovers the partitions of the nodes than the state-of-the-art ETSBM and STBM. Eventually, the emails of the Enron company are analyzed and visualizations of the results are presented, with meaningful highlights of the graph structure.
AB - Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualization of the data is mandatory. To address both issues, we introduce Deep-LPTM, a model-based clustering strategy relying on a variational graph auto-encoder approach and a probabilistic model to characterize the discussion topics. Deep-LPTM allows to build a joint representation of the nodes and the edges in two embedding spaces. The parameters are inferred using a variational inference algorithm. We also introduce IC2L, a model selection criterion specifically designed to choose models with relevant clustering and visualization properties. An extensive benchmark study on synthetic data is provided. In particular, we find that Deep-LPTM better recovers the partitions of the nodes than the state-of-the-art ETSBM and STBM. Eventually, the emails of the Enron company are analyzed and visualizations of the results are presented, with meaningful highlights of the graph structure.
KW - deep latent position model
KW - embedded topic model
KW - graph convolutional network
KW - unsupervised learning
UR - https://www.scopus.com/pages/publications/105014732897
U2 - 10.1111/sjos.70016
DO - 10.1111/sjos.70016
M3 - Article
AN - SCOPUS:105014732897
SN - 0303-6898
VL - 52
SP - 1975
EP - 2013
JO - Scandinavian Journal of Statistics
JF - Scandinavian Journal of Statistics
IS - 4
ER -