TY - GEN
T1 - Quality scheme assessment in the clustering process
AU - Halkidi, M.
AU - Vazirgiannis, M.
AU - Balislakis, V.
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2000.
PY - 2000/1/1
Y1 - 2000/1/1
N2 - Clustering is mostly an unsupervised procedure and most of the clustering algorithms depend on assumptions and initial guesses in order to define the subgroups presented in a data set. As a consequence, in most applications the final clusters require some sort of evaluation. The evaluation procedure has to tackle difficult problems, which can be qualitatively expressed as: i. quality of clusters, ii. the degree with which a clustering scheme fits a specific data set, iii. the optimal number of clusters in a partitioning. In this paper we present a scheme for finding the optimal partitioning of a data set during the clustering process regardless of the clustering algorithm used. More specifically, we present an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set. A clustering algorithm produces different partitions for different values of the input parameters. The proposed approach selects the best clustering scheme (i.e., the scheme with the most compact and well-separated clusters), according to a quality index we define. We verified our approach using two popular clustering algorithms on synthetic and real data sets in order to evaluate its reliability. Moreover, we study the influence of different clustering parameters to the proposed quality index.
AB - Clustering is mostly an unsupervised procedure and most of the clustering algorithms depend on assumptions and initial guesses in order to define the subgroups presented in a data set. As a consequence, in most applications the final clusters require some sort of evaluation. The evaluation procedure has to tackle difficult problems, which can be qualitatively expressed as: i. quality of clusters, ii. the degree with which a clustering scheme fits a specific data set, iii. the optimal number of clusters in a partitioning. In this paper we present a scheme for finding the optimal partitioning of a data set during the clustering process regardless of the clustering algorithm used. More specifically, we present an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set. A clustering algorithm produces different partitions for different values of the input parameters. The proposed approach selects the best clustering scheme (i.e., the scheme with the most compact and well-separated clusters), according to a quality index we define. We verified our approach using two popular clustering algorithms on synthetic and real data sets in order to evaluate its reliability. Moreover, we study the influence of different clustering parameters to the proposed quality index.
U2 - 10.1007/3-540-45372-5_26
DO - 10.1007/3-540-45372-5_26
M3 - Conference contribution
AN - SCOPUS:35048874596
SN - 9783540410669
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 265
EP - 276
BT - Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings
A2 - Zighed, Djamel A.
A2 - Komorowski, Jan
A2 - Zytkow, Jan
PB - Springer Verlag
T2 - 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000
Y2 - 13 September 2000 through 16 September 2000
ER -