Abstract
We propose a hard and a fuzzy diagonal co-clustering algorithms built upon the double K-means to address the problem of document-term co-clustering. At each iteration, the proposed algorithms seek a diagonal block structure of the data by minimizing a criterion based on both the variance within the class and the centroid effect. In addition to be easy-to-interpret and effective on sparse binary and continuous data, the proposed algorithms, Hard Diagonal Double K-means (DDKM) and Fuzzy Diagonal Double K-means (F-DDKM), are also faster than other state-of-the-art clustering algorithms. We evaluate our contribution using synthetic data sets, and real data sets commonly used in document clustering.
| Original language | English |
|---|---|
| Pages (from-to) | 133-147 |
| Number of pages | 15 |
| Journal | Neurocomputing |
| Volume | 193 |
| DOIs | |
| Publication status | Published - 12 Jun 2016 |
| Externally published | Yes |
Keywords
- Co-clustering
- Document clustering
- Fuzzy co-clustering