A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 5
May  2022

IEEE/CAA Journal of Automatica Sinica

• JCR Impact Factor: 6.171, Top 11% (SCI Q1)
CiteScore: 11.2, Top 5% (Q1)
Google Scholar h5-index: 51， TOP 8
Turn off MathJax
Article Contents
X. L. Yi, S. J. Zhang, T. Yang, T. Y. Chai, and  K. H. Johansson,  “A primal-dual SGD algorithm for distributed nonconvex optimization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 812–833, May 2022. doi: 10.1109/JAS.2022.105554
 Citation: X. L. Yi, S. J. Zhang, T. Yang, T. Y. Chai, and  K. H. Johansson,  “A primal-dual SGD algorithm for distributed nonconvex optimization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 812–833, May 2022.

# A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization

##### doi: 10.1109/JAS.2022.105554
Funds:  This work was supported by the Knut and Alice Wallenberg Foundation, the Swedish Foundation for Strategic Research, the Swedish Research Council, and the National Natural Science Foundation of China (62133003, 61991403, 61991404, 61991400)
• The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of n local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning. We propose a distributed primal-dual stochastic gradient descent (SGD) algorithm, suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions. We show that the proposed algorithm achieves the linear speedup convergence rate

\begin{document}${{{\cal{O}}(1/\sqrt{nT})}}$\end{document}

for general nonconvex cost functions and the linear speedup convergence rate

\begin{document}${\cal{O}}(1/(nT))$\end{document}

when the global cost function satisfies the Polyak-Łojasiewicz (P-Ł) condition, where T is the total number of iterations. We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum. We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.

• 1 Note: the parameter names are different in each paper.

### Catalog

###### 通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures(4)  / Tables(5)