TY - GEN
T1 - Adaptive sampling for incremental optimization using stochastic gradient descent
AU - Papa, Guillaume
AU - Bianchi, Pascal
AU - Clémençon, Stephan
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - A wide collection of popular statistical learning methods, ranging from K-means to Support Vector Machines through Neural Networks, can be formulated as a stochastic gradient descent (SGD) algorithm in a specific setup. In practice, the main limitation of this incremental optimization technique is due to the stochastic noise induced by the choice at random of the data involved in the gradient estimate computation at each iteration. It is the purpose of this paper to introduce a novel implementation of the SGD algorithm, where the data subset used at a given step is not picked uniformly at random among all possible subsets but drawn from a specific adaptive sampling scheme, depending on the past iterations in a Markovian manner, in order to refine the current statistical estimation of the gradient. Beyond an algorithmic description of the approach we propose, rate bounds are established and illustrative numerical results are displayed in order to provide theoretical and empirical evidence of its statistical performance, compared to more “naive” SGD implementations. Computational issues are also discussed at length, revealing the practical advantages of the method promoted.
AB - A wide collection of popular statistical learning methods, ranging from K-means to Support Vector Machines through Neural Networks, can be formulated as a stochastic gradient descent (SGD) algorithm in a specific setup. In practice, the main limitation of this incremental optimization technique is due to the stochastic noise induced by the choice at random of the data involved in the gradient estimate computation at each iteration. It is the purpose of this paper to introduce a novel implementation of the SGD algorithm, where the data subset used at a given step is not picked uniformly at random among all possible subsets but drawn from a specific adaptive sampling scheme, depending on the past iterations in a Markovian manner, in order to refine the current statistical estimation of the gradient. Beyond an algorithmic description of the approach we propose, rate bounds are established and illustrative numerical results are displayed in order to provide theoretical and empirical evidence of its statistical performance, compared to more “naive” SGD implementations. Computational issues are also discussed at length, revealing the practical advantages of the method promoted.
U2 - 10.1007/978-3-319-24486-0_21
DO - 10.1007/978-3-319-24486-0_21
M3 - Conference contribution
AN - SCOPUS:84945938091
SN - 9783319244853
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 317
EP - 331
BT - Algorithmic Learning Theory - 26th International Conference, ALT 2015
A2 - Gentile, Claudio
A2 - Zilles, Sandra
A2 - Chaudhuri, Kamalika
PB - Springer Verlag
T2 - 26th International Conference on Algorithmic Learning Theory, ALT 2015
Y2 - 4 October 2015 through 6 October 2015
ER -