Abstract
Co-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.
| Original language | English |
|---|---|
| Pages (from-to) | 446-473 |
| Number of pages | 28 |
| Journal | Data Mining and Knowledge Discovery |
| Volume | 33 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 15 Mar 2019 |
| Externally published | Yes |
Keywords
- Biclustering
- Clustering
- Feature selection
- High dimensional data
- Latent block model