TY - GEN
T1 - Persistence-based clustering in Riemannian manifolds
AU - Chazal, Frédéric
AU - Oudot, Steve
AU - Skraba, Primoz
AU - Guibas, Leonidas J.
PY - 2011/1/1
Y1 - 2011/1/1
N2 - We present a clustering scheme that combines a mode-seeking phase with a cluster merging phase in the corresponding density map. While mode detection is done by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of topological persistence to guide the merging of clusters. Our algorithm provides additional feedback in the form of a set of points in the plane, called a persistence diagram (PD), which provably reects the prominences of the modes of the density. In practice, this feedback enables the user to choose relevant parameter values, so that under mild sampling conditions the algorithm will output the correct number of clusters, a notion that can be made formally sound within persistence theory. The algorithm only requires rough estimates of the density at the data points, and knowledge of (approximate) pairwise distances between them. It is therefore applicable in any metric space. Meanwhile, its complexity remains practical: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of memory and takes barely more time to run than to read through the input. In this conference version of the paper we emphasize the experimental aspects of our work, describing the approach, iving an intuitive overview of its theoretical guarantees, discussing the choice of its parameters in practice, and demonstrating its potential in terms of applications through a series of experimental results obtained on synthetic and real-life data sets. Precise statements and proofs of our theoretical claims can be found in the full version of the paper [7].
AB - We present a clustering scheme that combines a mode-seeking phase with a cluster merging phase in the corresponding density map. While mode detection is done by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of topological persistence to guide the merging of clusters. Our algorithm provides additional feedback in the form of a set of points in the plane, called a persistence diagram (PD), which provably reects the prominences of the modes of the density. In practice, this feedback enables the user to choose relevant parameter values, so that under mild sampling conditions the algorithm will output the correct number of clusters, a notion that can be made formally sound within persistence theory. The algorithm only requires rough estimates of the density at the data points, and knowledge of (approximate) pairwise distances between them. It is therefore applicable in any metric space. Meanwhile, its complexity remains practical: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of memory and takes barely more time to run than to read through the input. In this conference version of the paper we emphasize the experimental aspects of our work, describing the approach, iving an intuitive overview of its theoretical guarantees, discussing the choice of its parameters in practice, and demonstrating its potential in terms of applications through a series of experimental results obtained on synthetic and real-life data sets. Precise statements and proofs of our theoretical claims can be found in the full version of the paper [7].
KW - Clustering
KW - Mode Seeking
KW - Morse Theory
KW - Topological Persistence
KW - Unsupervised Learning
UR - https://www.scopus.com/pages/publications/79960162399
U2 - 10.1145/1998196.1998212
DO - 10.1145/1998196.1998212
M3 - Conference contribution
AN - SCOPUS:79960162399
SN - 9781450306829
T3 - Proceedings of the Annual Symposium on Computational Geometry
SP - 97
EP - 106
BT - Proceedings of the 27th Annual Symposium on Computational Geometry, SCG'11
PB - Association for Computing Machinery
ER -