SGD algorithms based on incomplete U-statistics: Large-scale minimization of empirical risk

Guillaume Papa, Stéphan Clémençon, Aurélien Bellet

Research output: Contribution to journalConference articlepeer-review

Abstract

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e.g., pairs or triplets) of observations, rather than over individual observations. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the largescale setting, gradient estimates should be obtained by sampling tuples of data points with replacement (incomplete U-statistics) instead of sampling data points without replacement (complete U-statistics based on subsamples). We develop a theoretical framework accounting for the substantial impact of this strategy on the generalization ability of the prediction model returned by the Stochastic Gradient Descent (SGD) algorithm. It reveals that the method we promote achieves a much better trade-off between statistical accuracy and computational cost. Beyond the rate bound analysis, experiments on AUC maximization and metric learning provide strong empirical evidence of the superiority of the proposed approach.

Original languageEnglish
Pages (from-to)1027-1035
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2015-January
Publication statusPublished - 1 Jan 2015
Externally publishedYes
Event29th Annual Conference on Neural Information Processing Systems, NIPS 2015 - Montreal, Canada
Duration: 7 Dec 201512 Dec 2015

Fingerprint

Dive into the research topics of 'SGD algorithms based on incomplete U-statistics: Large-scale minimization of empirical risk'. Together they form a unique fingerprint.

Cite this