Stability Selection and Consensus Clustering in R: The R Package sharp

Barbara Bodinier, Sabrina Rodrigues, Maryam Karimi, Sarah Filippi, Julien Chiquet, Marc Chadeau-Hyam

Research output: Contribution to journalArticlepeer-review

Abstract

The R package sharp (Stability-enHanced Approaches using Resampling Procedures) provides an integrated framework for stability-enhanced variable selection, graphical modeling and clustering. In stability selection, a feature selection algorithm is combined with a resampling technique to estimate feature selection probabilities. Features with selection proportions above a threshold are considered stably selected. Similarly, a clustering algorithm is applied on multiple subsamples of items to compute co-membership proportions in consensus clustering. The consensus clusters are obtained by clustering using co-membership proportions as a measure of similarity. We calibrate the hyper-parameters of stability selection (or consensus clustering) jointly by maximizing a consensus score calculated under the null hypothesis of equiprobability of selection (or co-membership), which characterizes instability. The package offers flexibility in the modeling, includes diagnostic and visualization tools, and allows for parallelization.

Original languageEnglish
Pages (from-to)1-27
Number of pages27
JournalJournal of Statistical Software
Volume112
Issue number5
DOIs
Publication statusPublished - 1 Jan 2025
Externally publishedYes

Keywords

  • R
  • calibration
  • consensus clustering
  • graphical modeling
  • regularization
  • stability selection
  • structural equation modeling
  • variable selection

Fingerprint

Dive into the research topics of 'Stability Selection and Consensus Clustering in R: The R Package sharp'. Together they form a unique fingerprint.

Cite this