On U-processes and clustering performance

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order Oℙ (1√n). Based on recent results related to the tail behavior of degenerate U-processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 24
Subtitle of host publication25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
PublisherNeural Information Processing Systems
ISBN (Print)9781618395993
Publication statusPublished - 1 Jan 2011
Externally publishedYes
Event25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 - Granada, Spain
Duration: 12 Dec 201114 Dec 2011

Publication series

NameAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Conference

Conference25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Country/TerritorySpain
CityGranada
Period12/12/1114/12/11

Fingerprint

Dive into the research topics of 'On U-processes and clustering performance'. Together they form a unique fingerprint.

Cite this