IEEE/CAA Journal of Automatica Sinica
Citation:  J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentumbased FrankWolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923 
This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The FrankWolfe (FW) method has welldocumented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic FrankWolfe solver, by judiciously combining Nesterov’s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is
for convex optimization, and
for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.
[1] 
J. Chen, J. Sun, and G. Wang, “From unmanned systems to autonomous intelligent systems,” Eng., vol. 12, no. 5, pp. 16–19, 2022.

[2] 
Z. Jiang, Q. Jia, and X. Guan, “On large action space in EV charging scheduling optimization,” Sci. China Inf. Sci., vol. 65, p. 122201, 2022.

[3] 
R. Yang, L. Liu, and G. Feng, “An overview of recent advances in distributed coordination of multiagent systems,” Unmanned Syst., vol. 10, no. 3, pp. 307–325, 2022. doi: 10.1142/S2301385021500199

[4] 
L. Bottou, “Largescale machine learning with stochastic gradient descent,” in Proc. COMPSTAT. Heidelberg: PhysicaVerlag HD, 2010, pp. 177–186.

[5] 
X. Wang, J. Sun, G. Wang, F. Allgöwer, and J. Chen, “Datadriven control of distributed eventtriggered network systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 351–364, Feb. 2023.

[6] 
G. Wang, S. Lu, G. B. Giannakis, G. Tesauro, and J. Sun, “Decentralized TD tracking with linear function approximation and its finitetime analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2020.

[7] 
D. Wang, Z. Wang, and Z. Wu, “Distributed convex optimization for nonlinear multiagent systems disturbed by a secondorder stationary process over a digraph,” Sci. China Inf. Sci., vol. 65, p. 132201, 2022. doi: 10.1007/s1143202031114

[8] 
A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” J. Mach. Learn. Res., vol. 21, no. 105, pp. 1–49, 2020.

[9] 
Z. Akhtar and K. Rajawat, “Momentum based projection free stochastic optimization under affine constraints,” in Proc. American Control Conf., 2021, pp. 2619–2624.

[10] 
S. J. Reddi, S. Sra, B. Póczos, and A. Smola, “Stochastic FrankWolfe methods for nonconvex optimization,” in Proc. Annu. Allerton Conf. Commu., Control, Comput., 2016, pp. 1244–1251.

[11] 
Y. Li, X. Wang, J. Sun, G. Wang, and J. Chen, “Datadriven consensus control of fully distributed eventtriggered multiagent systems,” Sci. China Inf. Sci., 2022. DOI: 10.1007/s1143202236291.

[12] 
S. Pu, A. Olshevsky, and I. C. Paschalidis, “Asymptotic network independence in distributed stochastic optimization for machine learning: Examining distributed and centralized stochastic gradient descent,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 114–122, 2020. doi: 10.1109/MSP.2020.2975212

[13] 
Z. Wang, J. Zhang, T.H. Chang, J. Li, and Z.Q. Luo, “Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems,” IEEE Trans. Signal Process, vol. 69, pp. 4486–4501, 2021. doi: 10.1109/TSP.2021.3097211

[14] 
G. Wang, G. B. Giannakis, and J. Chen, “Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2357–2370, 2019. doi: 10.1109/TSP.2019.2904921

[15] 
Y. Dai and Y. Weng, “Synchronous parallel block coordinate descent method for nonsmooth convex function minimization,” J. Syst. Sci. Complex., vol. 33, no. 2, pp. 345–365, 2020. doi: 10.1007/s114240208313y

[16] 
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optim., vol. 19, no. 4, pp. 1574–1609, 2009. doi: 10.1137/070704277

[17] 
Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projectionbased algorithms for source localization in wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3131–3142, 2015. doi: 10.1109/TWC.2015.2402672

[18] 
X.F. Wang, Y. Hong, X.M. Sun, and K.Z. Liu, “Distributed optimization for resource allocation problems under large delays,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9448–9457, 2019. doi: 10.1109/TIE.2019.2891406

[19] 
S. Ghadimi and G. Lan, “Stochastic first and zerothorder methods for nonconvex stochastic programming,” SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, 2013. doi: 10.1137/120880811

[20] 
A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355–1364, 2021. doi: 10.1109/JAS.2021.1004075

[21] 
S. Fujishige and S. Isotani, “A submodular function minimization algorithm based on the minimumnorm base,” Pacific J. Opt., vol. 7, no. 1, pp. 3–17, 2011.

[22] 
M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Nav. Res. Logist., vol. 3, no. 1–2, pp. 95–110, 1956. doi: 10.1002/nav.3800030109

[23] 
G. Lan and Y. Zhou, “Conditional gradient sliding for convex optimization,” SIAM J. Optim., vol. 26, no. 2, pp. 1379–1409, 2016.

[24] 
D. S. Kalhan, A. Singh Bedi, A. Koppel, K. Rajawat, H. Hassani, A. K. Gupta, and A. Banerjee, “Dynamic online learning via FrankWolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 932–947, 2021. doi: 10.1109/TSP.2021.3051871

[25] 
T. Kerdreux, A. d’Aspremont, and S. Pokutta, “Projectionfree optimization on uniformly convex sets,” in Proc. Artif. Intell. Statist., 2021, vol. 130, pp. 19–27.

[26] 
B. Li, M. Coutiño, G. B. Giannakis, and G. Leus, “A momentumguided FrankWolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 3597–3611, 2021. doi: 10.1109/TSP.2021.3087910

[27] 
Y. Zhang, B. Li, and G. B. Giannakis, “Accelerating FrankWolfe with weighted average gradients,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021, pp. 5529–5533.

[28] 
E. Hazan and S. Kale, “Projectionfree online learning,” in Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 521–528.

[29] 
E. Hazan and H. Luo, “Variancereduced and projectionfree stochastic optimization,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1263–1271.

[30] 
K. Bekiroglu, S. Srinivasan, E. Png, R. Su, and C. Lagoa, “Recursive approximation of complex behaviours with IoTdata imperfections,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 656–667, 2020. doi: 10.1109/JAS.2020.1003126

[31] 
M. Weber, “Projectionfree nonconvex stochastic optimization on Riemannian manifolds,” IMA J. Numer. Anal., vol. 42, no. 4, pp. 3241–3271, 2021. doi: 10.1093/imanum/drab066

[32] 
A. Bellet, Y. Liang, A. B. Garakani, M. Balcan, and F. Sha, “A distributed FrankWolfe algorithm for communicationefficient sparse learning,” in Proc. SIAM Int. Conf. Data Mining, 2015, pp. 478–486.

[33] 
H.T. Wai, A. Scaglione, J. Lafond, and E. Moulines, “A projectionfree decentralized algorithm for nonconvex optimization,” in Proc. IEEE Global Conf. Signal and Inf. Process, 2016, pp. 475–479.

[34] 
H.T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized FrankWolfe algorithm for convex and nonconvex problems,” IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 5522–5537, 2017. doi: 10.1109/TAC.2017.2685559

[35] 
G. Chen, P. Yi, and Y. Hong, “Distributed optimization with projectionfree dynamics,” arXiv preprint arXiv: 2105.02450, 2021.

[36] 
L. Zhang, G. Wang, D. Romero, and G. B. Giannakis, “Randomized block FrankWolfe for convergent largescale learning,” IEEE Trans. Signal Processing, vol. 65, no. 24, pp. 6448–6461, 2017. doi: 10.1109/TSP.2017.2755597

[37] 
J. Xie, C. Zhang, Z. Shen, C. Mi, and H. Qian, “Decentralized gradient tracking for continuous DRsubmodular maximization,” in Proc. Int. Conf. Artif. Intell. Statist., 2019.

[38] 
H. Gao, H. Xu, and S. Vucetic, “Sample efficient decentralized stochastic FrankWolfe methods for continuous DRSubmodular maximization,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 3501–3507.

[39] 
A. Cutkosky and F. Orabona, “Momentumbased variance reduction in nonconvex SGD,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 15210–15219.

[40] 
Y. E. Nesterov, “A method for solving the convex programming problem with convergence rate

[41] 
J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Boston, USA, 1984.

[42] 
Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621

[43] 
X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multiagent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904

[44] 
J. Gao, X.W. Liu, Y.H. Dai, Y. Huang, and P. Yang, “Achieving geometric convergence for distributed optimization with barzilaiborwein step sizes,” Sci. China Inf. Sci., vol. 65, p. 149204, 2022.

[45] 
P. D. Lorenzo and G. Scutari, “Next: Innetwork nonconvex optimization,” IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 120–136, 2016.

[46] 
Z. Li, B. Liu, and Z. Ding, “Consensusbased cooperative algorithms for training over distributed data sets using stochastic gradients,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5579–5589, 2022.

[47] 
X. Ma, P. Yi, and J. Chen, “Distributed gradient tracking methods with finite data rates,” J. Syst. Sci. Complex., vol. 34, no. 5, pp. 1927–1952, 2021. doi: 10.1007/s1142402112319
