IEEE/CAA Journal of Automatica Sinica
Citation:  Z. Chen and N. Li, “An optimal controlbased distributed reinforcement learning framework for a class of nonconvex objective functionals of the multiagent network,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 11, pp. 2081–2093, Nov. 2023. doi: 10.1109/JAS.2022.105992 
This paper studies a novel distributed optimization problem that aims to minimize the sum of the nonconvex objective functionals of the multiagent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and nonconvexity, the EulerLagrange equation of the proposed problem is a complicated partial differential equation. Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning (RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework.
[1] 
A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multiagent optimization,” IEEE Trans. Automatic Control, vol. 54, no. 1, pp. 48–61, 2009. doi: 10.1109/TAC.2008.2009515

[2] 
A. Nedic, A. Ozdaglar, and A. Parrilo, “Constrained consensus and optimization in multiagent networks,” IEEE Trans. Automatic Control, vol. 55, no. 4, pp. 922–938, 2010. doi: 10.1109/TAC.2010.2041686

[3] 
S. S. Kia, J. Cortés, and S. Martínez, “Distributed convex optimization via continuoustime coordination algorithms with discretetime communication,” Automatica, vol. 55, pp. 254–264, 2015. doi: 10.1016/j.automatica.2015.03.001

[4] 
W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact firstorder algorithm for decentralized consensus optimization,” SIAM J. Optimization, vol. 25, no. 2, pp. 944–966, 2015. doi: 10.1137/14096668X

[5] 
T. Yang, D. Wu, H. Fang, W. Ren, H. Wang, Y. Hong, and K. H. Johansson, “Distributed energy resource coordination over timevarying directed communication networks,” IEEE Trans. Control of Network Systems, vol. 6, no. 3, pp. 1124–1134, 2019. doi: 10.1109/TCNS.2019.2921284

[6] 
D ai, W. Yu, and D. Chen, “Distributed Qlearning algorithm for dynamic resource allocation with unknown objective functions and application to microgrid,” IEEE Trans. Cybernetics, vol. 52, no. 11, pp. 12340–12350, 2022.

[7] 
D. Liu, S. Baldi, W. Yu, and G. Chen, “On distributed implementation of switchbased adaptive dynamic programming,” IEEE Trans. Cybernetics, vol. 52, no. 7, pp. 7218–7224, 2022.

[8] 
R. Mohebifard and A. Hajbabaie, “Distributed optimization and coordination algorithms for dynamic traffic metering in URBAN street networks,” IEEE Trans. Intelligent Transportation Systems, vol. 20, no. 5, pp. 1930–1941, 2018.

[9] 
S. Dougherty and M. Guay, “An extremumseeking controller for distributed optimization over sensor networks,” IEEE Trans. Automatic Control, vol. 62, no. 2, pp. 928–933, 2016.

[10] 
S. Zhu, C. Chen, X. Ma, B. Yang, and X. Guan, “Consensus based estimation over relay assisted sensor networks for situation monitoring,” IEEE J. Selected Topics in Signal Processing, vol. 9, no. 2, pp. 278–291, 2014.

[11] 
A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over timevarying graphs,” SIAM J. Optimization, vol. 27, no. 4, pp. 2597–2633, 2017. doi: 10.1137/16M1084316

[12] 
T. Yang, Y. Wan, H. Wang, and Z. Lin, “Global optimal consensus for discretetime multiagent systems with bounded controls,” Automatica, vol. 97, pp. 182–185, 2018. doi: 10.1016/j.automatica.2018.08.017

[13] 
X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multiagent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904

[14] 
P. Yi , Y. Hong, and F. Liu, “Distributed gradient algorithm for constrained optimization with application to load sharing in power systems,” Systems &Control Letters, vol. 83, pp. 45–52, 2015.

[15] 
Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621

[16] 
X. Li, X. Yi, and L. Xie, “Distributed online optimization for multiagent networks with coupled inequality constraints,” IEEE Trans. Automatic Control, vol. 66, no. 8, pp. 3575–3591, 2021.

[17] 
Q. Lü, X. Liao, T. Xiang, H. Li, and T. Huang, “Privacy masking stochastic subgradientpush algorithm for distributed online optimization,” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 3224–3237, 2020.

[18] 
L. Ding, P. Hu, Z.W. Liu, and G. Wen, “Transmission lines overload alleviation: Distributed online optimization approach,” IEEE Trans. Industrial Informatics, vol. 17, no. 5, pp. 3197–3208, 2020.

[19] 
S. Lee, A. Nedić, and M. Raginsky, “Stochastic dual averaging for decentralized online optimization on timevarying communication graphs,” IEEE Trans. Automatic Control, vol. 62, no. 12, pp. 6407–6414, 2017. doi: 10.1109/TAC.2017.2650563

[20] 
S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Trans. Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.

[21] 
S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex optimization on dynamic networks,” IEEE Trans. Automatic Control, vol. 61, no. 11, pp. 3545–3550, 2016. doi: 10.1109/TAC.2016.2525928

[22] 
D. Li, L. Yu, N. Li, and F. L. Lewis, “Virtualactionbased coordinated reinforcement learning for distributed economic dispatch,” IEEE Trans. Power Systems, vol. 36, no. 6, pp. 5143–5152, 2021.

[23] 
Y. Duan, X. He, and Y. Zhao, “Distributed algorithm based on consensus control strategy for dynamic economic dispatch problem,” Int. J. Electrical Power &Energy Systems, vol. 129, p. 106833, 2021.

[24] 
K. Wang, Z. Fu, Q. Xu, D. Chen, L. Wang, and W. Yu, “Distributed fixed stepsize algorithm for dynamic economic dispatch with power flow limits,” Science China Information Sciences, vol. 64, no. 1, pp. 1–13, 2021.

[25] 
X. He, Y. Zhao, and T. Huang, “Optimizing the dynamic economic dispatch problem by the distributed consensusbased admm approach,” IEEE Trans. Industrial Informatics, vol. 16, no. 5, pp. 3210–3221, 2019.

[26] 
Y. Eidelman, V. D. Milman, and A. Tsolomitis, Functional Analysis: An Introduction. Providence, USA: American Mathematical Society, 2004, vol. 66.

[27] 
F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. New York, USA: John Wiley & Sons, 2012.

[28] 
Z. Lin, J. Duan, S. E. Li, J. Li, H. Ma, Q. Sun, J. Chen, and B. Cheng, “Solving finitehorizon HJB for optimal control of continuoustime systems,” in Proc. IEEE Int. Conf. Computer, Control and Robotics, 2021, pp. 116–122.

[29] 
D. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Networks And Learning Systems, vol. 28, no. 3, pp. 500–509, 2015.

[30] 
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2017.

[31] 
B. Lian, W. Xue, F. L. Lewis, and T. Chai, “Robust inverse Qlearning for continuoustime linear systems in adversarial environments,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13083–13095, 2022. doi: 10.1109/TCYB.2021.3100749

[32] 
K. G. Vamvoudakis, D. Vrabie, and F. L. Lewis, “Online adaptive algorithm for optimal control with integral reinforcement learning,” Int. J. Robust Nonlinear Control, vol. 24, no. 17, pp. 2686–2710, 2014. doi: 10.1002/rnc.3018

[33] 
B. Lian, W. Xue, F. L. Lewis, and T. Chai, “Online inverse reinforcement learning for nonlinear systems with adversarial attacks,” Int. J. Robust and Nonlinear Control, vol. 31, no. 14, pp. 6646–6667, 2021. doi: 10.1002/rnc.5626

[34] 
Q. Zhao, H. Xu, and S. Jagannathan, “Near optimal output feedback control of nonlinear discretetime systems based on reinforcement neural network learning,” IEEE/CAA J. Automa. Sinica, vol. 1, no. 4, pp. 372–384, 2014. doi: 10.1109/JAS.2014.7004665

[35] 
Z. Shi and Z. Wang, “Adaptive outputfeedback optimal control for continuoustime linear systems based on adaptive dynamic programming approach,” Neurocomputing, vol. 438, pp. 334–344, 2021. doi: 10.1016/j.neucom.2021.01.070

[36] 
Z.P. Jiang, T. Bian, and W. N. Gao, “Learningbased control: A tutorial and some recent results,” Foundations and Trends ^{®} in Systems and Control, vol. 8, p. 3, 2020.

[37] 
K. G. Vamvoudakis and N.M. T. Kokolakis, “Synchronous reinforcement learningbased control for cognitive autonomy,” Foundations and Trends ^{®} in Systems and Control, vol. 8, pp. 1–2, 2020.

[38] 
Z. Chen, W. Q. Xue, N. Li, and F. L. Lewis, “Twoloop reinforcement learning algorithm for finitehorizon optimal control of continuoustime affine nonlinear systems,” Int. J. Robust and Nonlinear Control, vol. 32, no. 1, pp. 393–420, 2022. doi: 10.1002/rnc.5826

[39] 
A. Menon and J. S. Baras, “Collaborative extremum seeking for welfare optimization,” in Proc. IEEE Conf. Decision and Control, 2014, pp. 346–351.
