An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network

Zhe Chen; Ning Li

doi:10.1109/JAS.2022.105992

Volume 10 Issue 11

Nov. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(11): 2081-2093

Z. Chen and N. Li, “An optimal control-based distributed reinforcement learning framework for a class of non-convex objective functionals of the multi-agent network,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 11, pp. 2081–2093, Nov. 2023. doi: 10.1109/JAS.2022.105992

Citation:

Z. Chen and N. Li, “An optimal control-based distributed reinforcement learning framework for a class of non-convex objective functionals of the multi-agent network,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 11, pp. 2081–2093, Nov. 2023. doi: 10.1109/JAS.2022.105992

Citation:

PDF( 1809 KB)

An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network

doi: 10.1109/JAS.2022.105992

Zhe Chen,
Ning Li^,

Funds: This work was supported in part by the National Natural Science Foundation of China (NSFC) (61773260) and the Ministry of Science and Technology (2018YFB130590)

More Information

Abstract

Abstract

This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation. Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning (RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework.
- Distributed optimization,
- multi-agent,
- optimal control,
- reinforcement learning (RL)

FullText(HTML)

References(39)

References

[1]	A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multiagent optimization,” IEEE Trans. Automatic Control, vol. 54, no. 1, pp. 48–61, 2009. doi: 10.1109/TAC.2008.2009515
[2]	A. Nedic, A. Ozdaglar, and A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Automatic Control, vol. 55, no. 4, pp. 922–938, 2010. doi: 10.1109/TAC.2010.2041686
[3]	S. S. Kia, J. Cortés, and S. Martínez, “Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication,” Automatica, vol. 55, pp. 254–264, 2015. doi: 10.1016/j.automatica.2015.03.001
[4]	W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM J. Optimization, vol. 25, no. 2, pp. 944–966, 2015. doi: 10.1137/14096668X
[5]	T. Yang, D. Wu, H. Fang, W. Ren, H. Wang, Y. Hong, and K. H. Johansson, “Distributed energy resource coordination over time-varying directed communication networks,” IEEE Trans. Control of Network Systems, vol. 6, no. 3, pp. 1124–1134, 2019. doi: 10.1109/TCNS.2019.2921284
[6]	D ai, W. Yu, and D. Chen, “Distributed Q-learning algorithm for dynamic resource allocation with unknown objective functions and application to microgrid,” IEEE Trans. Cybernetics, vol. 52, no. 11, pp. 12340–12350, 2022.
[7]	D. Liu, S. Baldi, W. Yu, and G. Chen, “On distributed implementation of switch-based adaptive dynamic programming,” IEEE Trans. Cybernetics, vol. 52, no. 7, pp. 7218–7224, 2022.
[8]	R. Mohebifard and A. Hajbabaie, “Distributed optimization and coordination algorithms for dynamic traffic metering in URBAN street networks,” IEEE Trans. Intelligent Transportation Systems, vol. 20, no. 5, pp. 1930–1941, 2018.
[9]	S. Dougherty and M. Guay, “An extremum-seeking controller for distributed optimization over sensor networks,” IEEE Trans. Automatic Control, vol. 62, no. 2, pp. 928–933, 2016.
[10]	S. Zhu, C. Chen, X. Ma, B. Yang, and X. Guan, “Consensus based estimation over relay assisted sensor networks for situation monitoring,” IEEE J. Selected Topics in Signal Processing, vol. 9, no. 2, pp. 278–291, 2014.
[11]	A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM J. Optimization, vol. 27, no. 4, pp. 2597–2633, 2017. doi: 10.1137/16M1084316
[12]	T. Yang, Y. Wan, H. Wang, and Z. Lin, “Global optimal consensus for discrete-time multi-agent systems with bounded controls,” Automatica, vol. 97, pp. 182–185, 2018. doi: 10.1016/j.automatica.2018.08.017
[13]	X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904
[14]	P. Yi , Y. Hong, and F. Liu, “Distributed gradient algorithm for constrained optimization with application to load sharing in power systems,” Systems &Control Letters, vol. 83, pp. 45–52, 2015.
[15]	Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621
[16]	X. Li, X. Yi, and L. Xie, “Distributed online optimization for multiagent networks with coupled inequality constraints,” IEEE Trans. Automatic Control, vol. 66, no. 8, pp. 3575–3591, 2021.
[17]	Q. Lü, X. Liao, T. Xiang, H. Li, and T. Huang, “Privacy masking stochastic subgradient-push algorithm for distributed online optimization,” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 3224–3237, 2020.
[18]	L. Ding, P. Hu, Z.-W. Liu, and G. Wen, “Transmission lines overload alleviation: Distributed online optimization approach,” IEEE Trans. Industrial Informatics, vol. 17, no. 5, pp. 3197–3208, 2020.
[19]	S. Lee, A. Nedić, and M. Raginsky, “Stochastic dual averaging for de-centralized online optimization on time-varying communication graphs,” IEEE Trans. Automatic Control, vol. 62, no. 12, pp. 6407–6414, 2017. doi: 10.1109/TAC.2017.2650563
[20]	S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Trans. Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.
[21]	S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex optimization on dynamic networks,” IEEE Trans. Automatic Control, vol. 61, no. 11, pp. 3545–3550, 2016. doi: 10.1109/TAC.2016.2525928
[22]	D. Li, L. Yu, N. Li, and F. L. Lewis, “Virtual-action-based coordinated reinforcement learning for distributed economic dispatch,” IEEE Trans. Power Systems, vol. 36, no. 6, pp. 5143–5152, 2021.
[23]	Y. Duan, X. He, and Y. Zhao, “Distributed algorithm based on consensus control strategy for dynamic economic dispatch problem,” Int. J. Electrical Power &Energy Systems, vol. 129, p. 106833, 2021.
[24]	K. Wang, Z. Fu, Q. Xu, D. Chen, L. Wang, and W. Yu, “Distributed fixed step-size algorithm for dynamic economic dispatch with power flow limits,” Science China Information Sciences, vol. 64, no. 1, pp. 1–13, 2021.
[25]	X. He, Y. Zhao, and T. Huang, “Optimizing the dynamic economic dispatch problem by the distributed consensus-based admm approach,” IEEE Trans. Industrial Informatics, vol. 16, no. 5, pp. 3210–3221, 2019.
[26]	Y. Eidelman, V. D. Milman, and A. Tsolomitis, Functional Analysis: An Introduction. Providence, USA: American Mathematical Society, 2004, vol. 66.
[27]	F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. New York, USA: John Wiley & Sons, 2012.
[28]	Z. Lin, J. Duan, S. E. Li, J. Li, H. Ma, Q. Sun, J. Chen, and B. Cheng, “Solving finite-horizon HJB for optimal control of continuoustime systems,” in Proc. IEEE Int. Conf. Computer, Control and Robotics, 2021, pp. 116–122.
[29]	D. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Networks And Learning Systems, vol. 28, no. 3, pp. 500–509, 2015.
[30]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2017.
[31]	B. Lian, W. Xue, F. L. Lewis, and T. Chai, “Robust inverse Q-learning for continuous-time linear systems in adversarial environments,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13083–13095, 2022. doi: 10.1109/TCYB.2021.3100749
[32]	K. G. Vamvoudakis, D. Vrabie, and F. L. Lewis, “Online adaptive algorithm for optimal control with integral reinforcement learning,” Int. J. Robust Nonlinear Control, vol. 24, no. 17, pp. 2686–2710, 2014. doi: 10.1002/rnc.3018
[33]	B. Lian, W. Xue, F. L. Lewis, and T. Chai, “Online inverse reinforcement learning for nonlinear systems with adversarial attacks,” Int. J. Robust and Nonlinear Control, vol. 31, no. 14, pp. 6646–6667, 2021. doi: 10.1002/rnc.5626
[34]	Q. Zhao, H. Xu, and S. Jagannathan, “Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning,” IEEE/CAA J. Automa. Sinica, vol. 1, no. 4, pp. 372–384, 2014. doi: 10.1109/JAS.2014.7004665
[35]	Z. Shi and Z. Wang, “Adaptive output-feedback optimal control for continuous-time linear systems based on adaptive dynamic programming approach,” Neurocomputing, vol. 438, pp. 334–344, 2021. doi: 10.1016/j.neucom.2021.01.070
[36]	Z.-P. Jiang, T. Bian, and W. N. Gao, “Learning-based control: A tutorial and some recent results,” Foundations and Trends ^® in Systems and Control, vol. 8, p. 3, 2020.
[37]	K. G. Vamvoudakis and N.-M. T. Kokolakis, “Synchronous reinforcement learning-based control for cognitive autonomy,” Foundations and Trends ^® in Systems and Control, vol. 8, pp. 1–2, 2020.
[38]	Z. Chen, W. Q. Xue, N. Li, and F. L. Lewis, “Twoloop reinforcement learning algorithm for finite-horizon optimal control of continuous-time affine nonlinear systems,” Int. J. Robust and Nonlinear Control, vol. 32, no. 1, pp. 393–420, 2022. doi: 10.1002/rnc.5826
[39]	A. Menon and J. S. Baras, “Collaborative extremum seeking for welfare optimization,” in Proc. IEEE Conf. Decision and Control, 2014, pp. 346–351.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(12)

Get Citation

PDF

XML

Article Metrics

Article views (928) PDF downloads(160)

Highlights

This paper considers a novel distributed optimization problem, where the decision variable is a time-varying continuous function and the objective functional is the integration of a non-convex function over a continuous time interval. The existing distributed optimization method cannot deal with such a problem
For the proposed problem, this paper converts the optimization of the functional into an optimal control problem. This manner builds the relationship between distributed optimal control and distributed optimization. Moreover, a centralized reinforcement learning framework is proposed to solve the transformed problem
This paper also considers the privacy protection in distributed optimization. To eliminate its influence, a distributed reinforcement learning framework is further puts forward .Moreover, the convergence analysis is also provided

An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network

doi: 10.1109/JAS.2022.105992

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content