TY - GEN
T1 - Clustering validity assessment
T2 - 1st IEEE International Conference on Data Mining, ICDM'01
AU - Halkidi, Maria
AU - Vazirgiannis, Michalis
PY - 2001/12/1
Y1 - 2001/12/1
N2 - Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. In this paper we present a clustering validity procedure, which evaluates the results of clustering algorithms on data sets. We define a validity index, S-Dbw, based on well-defined clustering criteria enabling the selection of the optimal input parameters' values for a clustering algorithm that result in the best partitioning of a data set. We evaluate the reliability of our index both theoretically and experimentally, considering three representative clustering algorithms ran on synthetic and real data sets. Also, we carried out an evaluation study to compare S-Dbw performance with other known validity indices. Our approach performed favorably in all cases, even in those that other indices failed to indicate the correct partitions in a data set.
AB - Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. In this paper we present a clustering validity procedure, which evaluates the results of clustering algorithms on data sets. We define a validity index, S-Dbw, based on well-defined clustering criteria enabling the selection of the optimal input parameters' values for a clustering algorithm that result in the best partitioning of a data set. We evaluate the reliability of our index both theoretically and experimentally, considering three representative clustering algorithms ran on synthetic and real data sets. Also, we carried out an evaluation study to compare S-Dbw performance with other known validity indices. Our approach performed favorably in all cases, even in those that other indices failed to indicate the correct partitions in a data set.
UR - https://www.scopus.com/pages/publications/14944348667
M3 - Conference contribution
AN - SCOPUS:14944348667
SN - 0769511198
SN - 9780769511191
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 187
EP - 194
BT - Proceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Y2 - 29 November 2001 through 2 December 2001
ER -