TY - JOUR
T1 - Optimal convergence rates for convex distributed optimization in networks
AU - Scaman, Kevin
AU - Bach, Francis
AU - Bubeck, Sébastien
AU - Lee, Yin Tat
AU - Massoulié, Laurent
N1 - Publisher Copyright:
©c 2019 Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee and Laurent Massoulié.
PY - 2019/9/1
Y1 - 2019/9/1
N2 - This work proposes a theoretical analysis of distributed optimization of convex functions using a network of computing units. We investigate this problem under two communication schemes (centralized and decentralized) and four classical regularity assumptions: Lipschitz continuity, strong convexity, smoothness, and a combination of strong convexity and smoothness. Under the decentralized communication scheme, we provide matching upper and lower bounds of complexity along with algorithms achieving this rate up to logarithmic constants. For non-smooth objective functions, while the dominant term of the error is in O(1/√t), the structure of the communication network only impacts a second-order term in O(1/t), where t is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly convex objective functions. Such a convergence rate is achieved by the novel multi-step primal-dual (MSPD) algorithm. Under the centralized communication scheme, we show that the naive distribution of standard optimization algorithms is optimal for smooth objective functions, and provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function for non-smooth functions. We then show that DRS is within a d1/4 multiplicative factor of the optimal convergence rate, where d is the underlying dimension.
AB - This work proposes a theoretical analysis of distributed optimization of convex functions using a network of computing units. We investigate this problem under two communication schemes (centralized and decentralized) and four classical regularity assumptions: Lipschitz continuity, strong convexity, smoothness, and a combination of strong convexity and smoothness. Under the decentralized communication scheme, we provide matching upper and lower bounds of complexity along with algorithms achieving this rate up to logarithmic constants. For non-smooth objective functions, while the dominant term of the error is in O(1/√t), the structure of the communication network only impacts a second-order term in O(1/t), where t is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly convex objective functions. Such a convergence rate is achieved by the novel multi-step primal-dual (MSPD) algorithm. Under the centralized communication scheme, we show that the naive distribution of standard optimization algorithms is optimal for smooth objective functions, and provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function for non-smooth functions. We then show that DRS is within a d1/4 multiplicative factor of the optimal convergence rate, where d is the underlying dimension.
KW - Convex optimization
KW - Distributed optimization
KW - First-order methods
M3 - Article
AN - SCOPUS:85077514256
SN - 1532-4435
VL - 20
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -