TY - GEN
T1 - The Perturbed Prox-Preconditioned Spider Algorithm
T2 - 21st IEEE Statistical Signal Processing Workshop, SSP 2021
AU - Fort, G.
AU - Moulines, E.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/11
Y1 - 2021/7/11
N2 - A novel algorithm named PerturbedProx-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achieve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditioned gradient estimators. Preconditioning can either be applied explicitly to a gradient estimator or be introduced implicitly as in applications to the EM algorithm. 3P-SPIDER also assumes that the preconditioned gradients may (possibly) be not known in closed analytical form and therefore must be approximated which adds an additional degree of perturbation. Studying the convergence in expectation, we show that 3P-SPIDER achieves a near-optimal oracle inequality O(n1/2/ϵ) where n is the number of observations and ϵ the target precision even when the gradient is estimated by Monte Carlo methods. We illustrate the algorithm on an application to the minimization of a penalized empirical loss.
AB - A novel algorithm named PerturbedProx-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achieve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditioned gradient estimators. Preconditioning can either be applied explicitly to a gradient estimator or be introduced implicitly as in applications to the EM algorithm. 3P-SPIDER also assumes that the preconditioned gradients may (possibly) be not known in closed analytical form and therefore must be approximated which adds an additional degree of perturbation. Studying the convergence in expectation, we show that 3P-SPIDER achieves a near-optimal oracle inequality O(n1/2/ϵ) where n is the number of observations and ϵ the target precision even when the gradient is estimated by Monte Carlo methods. We illustrate the algorithm on an application to the minimization of a penalized empirical loss.
KW - Control Variates
KW - Finite sum optimization
KW - Large Scale Learning
KW - Statistical Learning
KW - Variance reduced Stochastic gradient
U2 - 10.1109/SSP49050.2021.9513846
DO - 10.1109/SSP49050.2021.9513846
M3 - Conference contribution
AN - SCOPUS:85113430027
T3 - IEEE Workshop on Statistical Signal Processing Proceedings
SP - 96
EP - 100
BT - 2021 IEEE Statistical Signal Processing Workshop, SSP 2021
PB - IEEE Computer Society
Y2 - 11 July 2021 through 14 July 2021
ER -