Noise-free latent block model for high dimensional data

Charlotte Laclau, Vincent Brault

Research output: Contribution to journalArticlepeer-review

Abstract

Co-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.

Original languageEnglish
Pages (from-to)446-473
Number of pages28
JournalData Mining and Knowledge Discovery
Volume33
Issue number2
DOIs
Publication statusPublished - 15 Mar 2019
Externally publishedYes

Keywords

  • Biclustering
  • Clustering
  • Feature selection
  • High dimensional data
  • Latent block model

Fingerprint

Dive into the research topics of 'Noise-free latent block model for high dimensional data'. Together they form a unique fingerprint.

Cite this