Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

Jie Hou; Xianlin Zeng; Gang Wang; Jian Sun; Jie Chen

doi:10.1109/JAS.2022.105923

Volume 10 Issue 3

Mar. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(3): 685-699

J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentum-based Frank-Wolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923

Citation:

J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentum-based Frank-Wolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923

Citation:

PDF( 1853 KB)

Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

doi: 10.1109/JAS.2022.105923

Jie Hou^1
,,
Xianlin Zeng^{1
,
,},
Gang Wang^2
,,
Jian Sun^2
,,
Jie Chen^3
,

1.
National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China
2.
National Key Laboratory of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, and also with the Beijing Institute of Technology Chongqing Innovation Center, Chongqing 401120, China
3.
School of Electronic and Information Engineering, Tongji University, Shanghai 200082, and also with the National Key Laboratory of Autonomous Intelligent Unmanned Systems, the School of Automation, Beijing Institute of Technology, Beijing 100081, China

Funds: This work was supported in part by the National Key R&D Program of China (2021YFB1714800), the National Natural Science Foundation of China (62222303, 62073035, 62173034, 61925303, 62088101, 61873033), the CAAI-Huawei MindSpore Open Fund, and the Chongqing Natural Science Foundation (2021ZX4100027)

More Information

Author Bio:
Jie Hou received the B.S. degree in computer science and technology from the School of Computer and Information Engineering, Luoyang Institute of Science and Technology in 2017, the M.S. degree in computer science and technology from the School of Computer Science and Software Engineering, Tiangong University in 2020. She is currently a Ph.D. candidate in control science and engineering with the School of Automation, Beijing Institute of Technology. Her current research interests include distributed optimization and stochastic optimization

Xianlin Zeng (Member, IEEE) received the B.S. and M.S. degrees in control science and engineering from the Harbin Institute of Technology in 2009 and 2011, respectively, and the Ph.D. degree in mechanical engineering from the Texas Tech University, USA in 2015. He is currently an Associate Professor with School of Automation, Beijing Institute of Technology. His current research interests include distributed optimization, distributed control, and distributed computation of network systems

Gang Wang (Member, IEEE) received the B.Eng. degree in automatic control, and the Ph.D. degree in control science and engineering from the Beijing Institute of Technology, in 2011 and 2018, respectively. He also received the Ph.D. degree in electrical and computer engineering from the University of Minnesota, USA in 2018, where he stayed as a Postdoctoral Researcher until July 2020. Since August 2020, he has been a Professor with the School of Automation at the Beijing Institute of Technology. His research interests focus on the areas of signal processing, control, and reinforcement learning with applications to cyber-physical systems and multi-agent systems. He was the recipient of the Best Paper Award from the Frontiers of Information Technology & Electronic Engineering (FITEE) in 2021, the Excellent Doctoral Dissertation Award from the Chinese Association of Automation in 2019, the Best Student Paper Award from the 2017 European Signal Processing Conference, and the Best Conference Paper at the 2019 IEEE Power & Energy Society General Meeting. He is currently on the editorial board of Signal Processing, IEEE Open Journal of Control Systems, IEEE Transactions on Signal and Information Processing over Networks, and an Early Career Board Member of the IEEE/CAA Journal of Automatica Sinica

Jian Sun (Senior Member, IEEE) received the B.S. degree from the Department of Automation and Electric Engineering, Jilin Institute of Technology in 2001, the M.S. degree from the Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences (CAS) in 2004, and the Ph.D. degree from the Institute of Automation in 2007. He was a Research Fellow with the Faculty of Advanced Technology, University of Glamorgan, UK from 2008 to 2009. He was a Post-Doctoral Research Fellow with the Beijing Institute of Technology, from 2007 to 2010. In 2010, he joined the School of Automation, Beijing Institute of Technology, where he has been a Professor since 2013. His current research interests include networked control systems, time-delay systems, and security of cyber-physical systems.Dr. Sun is an Editorial Board Member of the IEEE Transactions on Systems, Man and Cybernetics: Systems, the Journal of Systems Science & Complexity, and Acta Automatica Sinica

Jie Chen (Fellow, IEEE) received the B.S., M.S., and Ph.D. degrees in control theory and control engineering from the Beijing Institute of Technology, in 1986, 1996 and 2001, respectively. From 1989 to 1990, he was a Visiting Scholar at the California State University, USA. From 1996 to 1997, he was a Research Fellow with the School of Engineering at the University of Birmingham, UK. He is a Professor with the School of Automation, Beijing Institute of Technology, where he serves as the Director of the Key Laboratory of Intelligent Control and Decision of Complex Systems. He also serves as the President of Tongji University, China. His research interests include multiagent systems, multiobjective optimization and decision, and constrained nonlinear control. Prof. Chen is currently the Editor-in-Chief of Unmanned Systems, Autonomous Intelligent Systems, and Journal of Systems Science and Complexity. He has served on the editorial boards of several journals, including the IEEE Transactions on Cybernetics, International Journal of Robust and Nonlinear Control, and Science China Information Sciences. He is a Fellow of IEEE, IFAC, and a Member of the Chinese Academy of Engineering
Corresponding author: Xianlin Zeng, e-mail: xianlin.zeng@bit.edu.cn
Received Date: 2022-04-14
Revised Date: 2022-05-19
Accepted Date: 2022-06-21

Available Online: 2023-04-12

Abstract

Abstract

This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov’s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is ${\cal{O}}(k^{-\frac{1}{2}})$ for convex optimization, and ${\cal{O}}(1/\mathrm{log}_2(k))$ for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.
- Distributed optimization,
- Frank-Wolfe (FW) algorithms,
- momentum-based method,
- stochastic optimization

FullText(HTML)

References(47)

References

[1]	J. Chen, J. Sun, and G. Wang, “From unmanned systems to autonomous intelligent systems,” Eng., vol. 12, no. 5, pp. 16–19, 2022.
[2]	Z. Jiang, Q. Jia, and X. Guan, “On large action space in EV charging scheduling optimization,” Sci. China Inf. Sci., vol. 65, p. 122201, 2022.
[3]	R. Yang, L. Liu, and G. Feng, “An overview of recent advances in distributed coordination of multi-agent systems,” Unmanned Syst., vol. 10, no. 3, pp. 307–325, 2022. doi: 10.1142/S2301385021500199
[4]	L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT. Heidelberg: Physica-Verlag HD, 2010, pp. 177–186.
[5]	X. Wang, J. Sun, G. Wang, F. Allgöwer, and J. Chen, “Data-driven control of distributed event-triggered network systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 351–364, Feb. 2023.
[6]	G. Wang, S. Lu, G. B. Giannakis, G. Tesauro, and J. Sun, “Decentralized TD tracking with linear function approximation and its finite-time analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2020.
[7]	D. Wang, Z. Wang, and Z. Wu, “Distributed convex optimization for nonlinear multi-agent systems disturbed by a second-order stationary process over a digraph,” Sci. China Inf. Sci., vol. 65, p. 132201, 2022. doi: 10.1007/s11432-020-3111-4
[8]	A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” J. Mach. Learn. Res., vol. 21, no. 105, pp. 1–49, 2020.
[9]	Z. Akhtar and K. Rajawat, “Momentum based projection free stochastic optimization under affine constraints,” in Proc. American Control Conf., 2021, pp. 2619–2624.
[10]	S. J. Reddi, S. Sra, B. Póczos, and A. Smola, “Stochastic Frank-Wolfe methods for nonconvex optimization,” in Proc. Annu. Allerton Conf. Commu., Control, Comput., 2016, pp. 1244–1251.
[11]	Y. Li, X. Wang, J. Sun, G. Wang, and J. Chen, “Data-driven consensus control of fully distributed event-triggered multi-agent systems,” Sci. China Inf. Sci., 2022. DOI: 10.1007/s11432-022-3629-1.
[12]	S. Pu, A. Olshevsky, and I. C. Paschalidis, “Asymptotic network independence in distributed stochastic optimization for machine learning: Examining distributed and centralized stochastic gradient descent,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 114–122, 2020. doi: 10.1109/MSP.2020.2975212
[13]	Z. Wang, J. Zhang, T.-H. Chang, J. Li, and Z.-Q. Luo, “Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems,” IEEE Trans. Signal Process, vol. 69, pp. 4486–4501, 2021. doi: 10.1109/TSP.2021.3097211
[14]	G. Wang, G. B. Giannakis, and J. Chen, “Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2357–2370, 2019. doi: 10.1109/TSP.2019.2904921
[15]	Y. Dai and Y. Weng, “Synchronous parallel block coordinate descent method for nonsmooth convex function minimization,” J. Syst. Sci. Complex., vol. 33, no. 2, pp. 345–365, 2020. doi: 10.1007/s11424-020-8313-y
[16]	A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optim., vol. 19, no. 4, pp. 1574–1609, 2009. doi: 10.1137/070704277
[17]	Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projection-based algorithms for source localization in wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3131–3142, 2015. doi: 10.1109/TWC.2015.2402672
[18]	X.-F. Wang, Y. Hong, X.-M. Sun, and K.-Z. Liu, “Distributed optimization for resource allocation problems under large delays,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9448–9457, 2019. doi: 10.1109/TIE.2019.2891406
[19]	S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,” SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, 2013. doi: 10.1137/120880811
[20]	A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355–1364, 2021. doi: 10.1109/JAS.2021.1004075
[21]	S. Fujishige and S. Isotani, “A submodular function minimization algorithm based on the minimum-norm base,” Pacific J. Opt., vol. 7, no. 1, pp. 3–17, 2011.
[22]	M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Nav. Res. Logist., vol. 3, no. 1–2, pp. 95–110, 1956. doi: 10.1002/nav.3800030109
[23]	G. Lan and Y. Zhou, “Conditional gradient sliding for convex optimization,” SIAM J. Optim., vol. 26, no. 2, pp. 1379–1409, 2016.
[24]	D. S. Kalhan, A. Singh Bedi, A. Koppel, K. Rajawat, H. Hassani, A. K. Gupta, and A. Banerjee, “Dynamic online learning via Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 932–947, 2021. doi: 10.1109/TSP.2021.3051871
[25]	T. Kerdreux, A. d’Aspremont, and S. Pokutta, “Projection-free optimization on uniformly convex sets,” in Proc. Artif. Intell. Statist., 2021, vol. 130, pp. 19–27.
[26]	B. Li, M. Coutiño, G. B. Giannakis, and G. Leus, “A momentum-guided Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 3597–3611, 2021. doi: 10.1109/TSP.2021.3087910
[27]	Y. Zhang, B. Li, and G. B. Giannakis, “Accelerating Frank-Wolfe with weighted average gradients,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021, pp. 5529–5533.
[28]	E. Hazan and S. Kale, “Projection-free online learning,” in Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 521–528.
[29]	E. Hazan and H. Luo, “Variance-reduced and projection-free stochastic optimization,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1263–1271.
[30]	K. Bekiroglu, S. Srinivasan, E. Png, R. Su, and C. Lagoa, “Recursive approximation of complex behaviours with IoT-data imperfections,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 656–667, 2020. doi: 10.1109/JAS.2020.1003126
[31]	M. Weber, “Projection-free nonconvex stochastic optimization on Riemannian manifolds,” IMA J. Numer. Anal., vol. 42, no. 4, pp. 3241–3271, 2021. doi: 10.1093/imanum/drab066
[32]	A. Bellet, Y. Liang, A. B. Garakani, M. Balcan, and F. Sha, “A distributed Frank-Wolfe algorithm for communication-efficient sparse learning,” in Proc. SIAM Int. Conf. Data Mining, 2015, pp. 478–486.
[33]	H.-T. Wai, A. Scaglione, J. Lafond, and E. Moulines, “A projection-free decentralized algorithm for non-convex optimization,” in Proc. IEEE Global Conf. Signal and Inf. Process, 2016, pp. 475–479.
[34]	H.-T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized Frank-Wolfe algorithm for convex and nonconvex problems,” IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 5522–5537, 2017. doi: 10.1109/TAC.2017.2685559
[35]	G. Chen, P. Yi, and Y. Hong, “Distributed optimization with projection-free dynamics,” arXiv preprint arXiv: 2105.02450, 2021.
[36]	L. Zhang, G. Wang, D. Romero, and G. B. Giannakis, “Randomized block Frank-Wolfe for convergent large-scale learning,” IEEE Trans. Signal Processing, vol. 65, no. 24, pp. 6448–6461, 2017. doi: 10.1109/TSP.2017.2755597
[37]	J. Xie, C. Zhang, Z. Shen, C. Mi, and H. Qian, “Decentralized gradient tracking for continuous DR-submodular maximization,” in Proc. Int. Conf. Artif. Intell. Statist., 2019.
[38]	H. Gao, H. Xu, and S. Vucetic, “Sample efficient decentralized stochastic Frank-Wolfe methods for continuous DR-Submodular maximization,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 3501–3507.
[39]	A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex SGD,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 15210–15219.
[40]	Y. E. Nesterov, “A method for solving the convex programming problem with convergence rate $ {\cal{O}}(1/k^2)$” Dokl. Akad. Nauk SSSR, vol. 269, no. 3, pp. 543–547, 1983.
[41]	J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Boston, USA, 1984.
[42]	Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621
[43]	X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904
[44]	J. Gao, X.-W. Liu, Y.-H. Dai, Y. Huang, and P. Yang, “Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes,” Sci. China Inf. Sci., vol. 65, p. 149204, 2022.
[45]	P. D. Lorenzo and G. Scutari, “Next: In-network nonconvex optimization,” IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 120–136, 2016.
[46]	Z. Li, B. Liu, and Z. Ding, “Consensus-based cooperative algorithms for training over distributed data sets using stochastic gradients,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5579–5589, 2022.
[47]	X. Ma, P. Yi, and J. Chen, “Distributed gradient tracking methods with finite data rates,” J. Syst. Sci. Complex., vol. 34, no. 5, pp. 1927–1952, 2021. doi: 10.1007/s11424-021-1231-9

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (1136) PDF downloads(78)

Highlights

A distributed stochastic Frank-Wolfe algorithm is proposed for stochastic optimization problems by judiciously combining Nesterov’s momentum and gradient tracking techniques
The convergence rate results of the proposed algorithm for convex and nonconvex problems are established, which, to the authors’ best knowledge, marks the first FW’s convergence rate results for distributed stochastic optimization
The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives

Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

doi: 10.1109/JAS.2022.105923

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content