missSBM: An R Package for Handling Missing Values in the Stochastic Block Model

Research output: Contribution to journalArticlepeer-review

Abstract

The stochastic block model is a popular probabilistic model for random graphs. It is commonly used to cluster network data by aggregating nodes that share similar connec-tivity patterns into blocks. When fitting a stochastic block model to a partially observed network, it is important to consider the underlying process that generates the missing values, otherwise the inference may be biased. This paper presents missSBM, an R package that fits stochastic block models when the network is partially observed, i.e., the adjacency matrix contains not only 1s or 0s encoding the presence or absence of edges, but also NAs encoding the missing information between pairs of nodes. This package im-plements a set of algorithms to adjust the binary stochastic block model, possibly in the presence of external covariates, by performing variational inference suitable for several observation processes. Our implementation automatically explores different block num-bers to select the most relevant model according to the integrated classification likelihood criterion. The integrated classification likelihood criterion can also help determine which observation process best fits a given dataset. Finally, missSBM can be used to perform imputation of missing entries in the adjacency matrix. We illustrate the package on a network dataset consisting of interactions between political blogs sampled during the 2007 French presidential election.

Original languageEnglish
Pages (from-to)1-32
Number of pages32
JournalJournal of Statistical Software
Volume101
Issue number12
DOIs
Publication statusPublished - 1 Jan 2022
Externally publishedYes

Keywords

  • Missing data
  • Network
  • Stochastic block model

Fingerprint

Dive into the research topics of 'missSBM: An R Package for Handling Missing Values in the Stochastic Block Model'. Together they form a unique fingerprint.

Cite this