TY - GEN
T1 - Optimal algorithms for smooth and strongly convex distributed optimization in networks
AU - Seaman, Kevin
AU - Bach, Francis
AU - Bubeck, Sebastien
AU - Lee, Yin Tat
AU - Massoulie, Laurent
N1 - Publisher Copyright:
Copyright © 2017 by the author(s).
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Theory group, Microsoft Research, Redmond, United Stales, In this paper, wc determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network. For centralized (i.e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision e > 0 in time 0(KJ(1 + Ar)ln(l/e)), where Kg is the condition number of the (global) function to optimize, A is the diameter of the network, and r (resp. 1) is the time needed to communicate values between two neighbors (resp. perform local computations). For decentralized algorithms based on gossip, we provide the first optimal algorithm, called the multi-step dual accelerated (MSDA) method, that achieves a precision e > 0 in time 0 0 7 ( 1 + where KJ is the condition number of the local functions and 7 is the (normalized) eigengap of the gossip matrix used for communication between nodes. We then verify the efficiency of MSDA against state-of-theart methods for two problems: least-squares regression and classification by logistic regression.
AB - Theory group, Microsoft Research, Redmond, United Stales, In this paper, wc determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network. For centralized (i.e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision e > 0 in time 0(KJ(1 + Ar)ln(l/e)), where Kg is the condition number of the (global) function to optimize, A is the diameter of the network, and r (resp. 1) is the time needed to communicate values between two neighbors (resp. perform local computations). For decentralized algorithms based on gossip, we provide the first optimal algorithm, called the multi-step dual accelerated (MSDA) method, that achieves a precision e > 0 in time 0 0 7 ( 1 + where KJ is the condition number of the local functions and 7 is the (normalized) eigengap of the gossip matrix used for communication between nodes. We then verify the efficiency of MSDA against state-of-theart methods for two problems: least-squares regression and classification by logistic regression.
UR - https://www.scopus.com/pages/publications/85048558781
M3 - Conference contribution
AN - SCOPUS:85048558781
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 4630
EP - 4642
BT - 34th International Conference on Machine Learning, ICML 2017
PB - International Machine Learning Society (IMLS)
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -