Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

Ding Wang; Jiangyu Wang; Mingming Zhao; Peng Xin; Junfei Qiao

doi:10.1109/JAS.2023.123684

Volume 10 Issue 9

Sep. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(9): 1797-1809

D. Wang, J. Y. Wang, M. M. Zhao, P. Xin, and J. F. Qiao, “Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1797–1809, Sept. 2023. doi: 10.1109/JAS.2023.123684

Citation:

D. Wang, J. Y. Wang, M. M. Zhao, P. Xin, and J. F. Qiao, “Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1797–1809, Sept. 2023. doi: 10.1109/JAS.2023.123684

Citation:

PDF( 2350 KB)

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

doi: 10.1109/JAS.2023.123684

Funds: This work was supported in part by the National Key Research and Development Program of China (2021ZD0112302), the National Natural Science Foundation of China (62222301, 61890930-5, 62021003), and the Beijing Natural Science Foundation (JQ19013)

More Information

Author Bio:
Ding Wang (Senior Member, IEEE) received the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, in 2012. He was an Associate Professor with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. He is currently a Full Professor with the Faculty of Information Technology, Beijing University of Technology. His current research interests include adaptive critic control with industrial applications, reinforcement learning and intelligent systems. He has authored or co-authored over 120 journal and conference papers and four monographs. He was successively selected as a Clarivate Highly Cited Researcher from 2020 to 2022. He is a Member of IEEE/CAA Journal of Automatica Sinica Early Career Advisory Board. He currently or formerly serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Neural Networks, Engineering Applications of Artificial Intelligence, International Journal of Robust and Nonlinear Control, International Journal of Adaptive Control and Signal Processing, Neurocomputing, and Acta Automatica Sinica

Jiangyu Wang received B.E. in automation from Tianjin University of Technology in 2021. He is a master student in control science and engineering at the Beijing University of Technology. His research interests include adaptive dynamic programming, reinforcement learning with industrial applications, and intelligent systems

Mingming Zhao received the B.E. degree in automation from Henan Polytechnic University in 2019, and the M.E. degree in control engineering from Beijing University of Technology in 2022. He is currently a Ph.D. candidate in control science and engineering at Beijing University of Technology. His research interests include adaptive dynamic programming, reinforcement learning with industrial applications, and intelligent systems

Peng Xin received the B.E. degree in automation and the M.E. degree in control engineering from Lanzhou University of Technology, in 2018 and 2021, respectively. He is currently a Ph.D. candidate in control science and engineering at Beijing University of Technology. His research interests include adaptive dynamic programming, model predictive control, reinforcement learning with industrial applications, and intelligent systems

Junfei Qiao (Senior Member, IEEE) received the B.E. and M.E. degrees in control engineering from Liaoning Technical University, in 1992 and 1995, respectively, and the Ph.D. degree in control theory and control engineering from Northeastern University in 1998. He is currently a Professor with the Faculty of Information Technology, Beijing University of Technology, where he is also the Director of Beijing Laboratory of Smart Environmental Protection. His current research interests include neural networks, intelligent systems, self-adaptive systems and process control
Corresponding author: Ding Wang, e-mail: dingwang@bjut.edu.cn
Received Date: 2023-04-18
Accepted Date: 2023-05-15

Available Online: 2023-06-25

Abstract

Abstract

This paper is concerned with a novel integrated multi-step heuristic dynamic programming (MsHDP) algorithm for solving optimal control problems. It is shown that, initialized by the zero cost function, MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman (HJB) equation. Then, the stability of the system is analyzed using control policies generated by MsHDP. Also, a general stability criterion is designed to determine the admissibility of the current control policy. That is, the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP. Further, based on the convergence and the stability criterion, the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly. Besides, actor-critic is utilized to implement the integrated MsHDP scheme, where neural networks are used to evaluate and improve the iterative policy as the parameter architecture. Finally, two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.

FullText(HTML)

References(43)

References

[1]	D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
[2]	N. A. Barriga, M. Stanescu, F. Besoain, and M. Buro, “Improving RTS game AI by supervised policy learning, tactical search, and deep reinforcement learning,” IEEE Computational Intelligence Magazine, vol. 14, no. 3, pp. 8–18, Aug. 2019. doi: 10.1109/MCI.2019.2919363
[3]	Y. Jiang, Z. Cao, and J. Zhang, “Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming,” IEEE Trans. Cybernetics, vol. 53, no. 5, pp. 2864–2875, May 2023.
[4]	D. Wang, H. He, X. Zhong, and D. Liu, “Event-driven nonlinear discounted optimal regulation involving a power system application,” IEEE Trans. Industrial Elecctronics, vol. 64, no. 10, pp. 8177–8186, Oct. 2017. doi: 10.1109/TIE.2017.2698377
[5]	E. Foruzan, L. K. Soh, and S. Asgarpoor, “Reinforcement learning approach for optimal distributed energy management in a microgrid,” IEEE Trans. Power Systems, vol. 33, no. 5, pp. 5749–5758, Sept. 2018. doi: 10.1109/TPWRS.2018.2823641
[6]	D. Chen, K. Chen, Z. Li, and T. Chu, et al., “PowerNet: Multi-agent deep reinforcement learning for scalable powergrid control,” IEEE Trans. Power Systems, vol. 37, no. 2, pp. 1007–1017, Mar. 2022. doi: 10.1109/TPWRS.2021.3100898
[7]	D. Wang, M. Ha, and M. Zhao, “The intelligent critic framework for advanced optimal control,” Artificial Intelligence Review, vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
[8]	M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Apr. 2022. doi: 10.1109/JAS.2022.105692
[9]	Y. Li, Y. Liu, and S. Tong, “Observer-based neuro-adaptive optimized control of strict-feedback nonlinear systems with state constraints,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 7, pp. 3131–3145, Jul. 2022. doi: 10.1109/TNNLS.2021.3051030
[10]	H. Wang, C. Yang, X. Liu, and L. Zhou, “Neural-network-based adaptive control of uncertain MIMO singularly perturbed systems with full-state constraints,” IEEE Trans. Neural Networks and Learning Systems, 2022.
[11]	C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Automatic Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
[12]	S. Song, M. Zhu, X. Dai, and D. Gong, “Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm,” IEEE Trans. Neural Networks Learning Systems, 2022.
[13]	W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Hoboken, USA: Wiley, 2007.
[14]	D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 997–1007, Sept. 1997. doi: 10.1109/72.623201
[15]	P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, USA: Van Nostrand Reinhold, 1992, ch. 13.
[16]	D. P. Bertsekas, Dynamic Programming and Optimal Control Chapter, vol. 1, Belmont, USA: Athena Scientific, 1995.
[17]	R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, USA: MIT Press, 1998.
[18]	J. Si and Y. T. Wang, “On-line learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
[19]	S. Al-Dabooni and D. C. Wunsch, “Online model-free n-step HDP with stability analysis,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 4, pp. 1255–1269, Apr. 2020. doi: 10.1109/TNNLS.2019.2919614
[20]	B. Luo, D. Liu, T. Huang, X. Yang, and H. Ma, “Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems,” Information Sciences, vol. 411, pp. 66–83, May 2017. doi: 10.1016/j.ins.2017.05.005
[21]	B. Luo, D. Liu, T. Huang, and J. Liu, “Output tracking control based on adaptive dynamic programming with multistep policy evaluation,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 49, no. 10, pp. 2155–2165, Oct. 2019. doi: 10.1109/TSMC.2017.2771516
[22]	V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning, Jun. 2016, vol. 48, pp. 1928–1937.
[23]	Q. Wei, L. Wang, Y. Liu, and M. M. Polycarpou, “Optimal elevator group control via deep asynchronous actor-critic learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5245–5256, Dec. 2020. doi: 10.1109/TNNLS.2020.2965208
[24]	M. Ha, D. Wang, and D. Liu, “Offline and online adaptive critic control designs with stability guarantee through value iteration,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13262–13274, Dec. 2022.
[25]	D. Wang, M. Zhao, M. Ha, and L. Hu, “Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems,” Engineering Applications of Artificial Intelligence, vol. 105, pp. 104443: 1–11, Dec. 2020.
[26]	A. Roy, V. Borkar, A. Karandikar, and P. Chaporkar, “Online reinforcement learning of optimal threshold policies for Markov decision processes,” IEEE Trans. Automatic Control, vol. 67, no. 7, pp. 3722–3729, Jul. 2022. doi: 10.1109/TAC.2021.3108121
[27]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Systems,Man,and Cybernetics-Part B: Cybernetics, vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
[28]	D. Wang, M. Ha, and J. Qiao, “Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation,” IEEE Trans. Automatic Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. doi: 10.1109/TAC.2019.2926167
[29]	M. Ha, D. Wang, and D. Liu, “A novel value iteration scheme with adjustable convergence rate,” IEEE Trans. Neural Networks and Learning Systems, 2022.
[30]	B. Luo, Y. Yang, H. N. Wu, and T. Huang, “Balancing value iteration and policy iteration for discrete-time control,” IEEE Trans. Syst.,Man,and Cybernetics: Syst., vol. 50, no. 11, pp. 3948–3958, Nov. 2020. doi: 10.1109/TSMC.2019.2898389
[31]	D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
[32]	D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Systems,Man,Cybernetics: Systems, vol. 45, no. 12, pp. 1577–1591, Dec. 2015. doi: 10.1109/TSMC.2015.2417510
[33]	D. Bertsekas, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 249–272, Feb. 2021. doi: 10.1109/JAS.2021.1003814
[34]	Q. Wei and D. Liu, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybernetics, vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
[35]	F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009. doi: 10.1109/MCAS.2009.933854
[36]	H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method,” IEEE Trans. Neural Networks, vol. 22, no. 12, pp. 2226–2236, Dec. 2011. doi: 10.1109/TNN.2011.2168538
[37]	D. Wang, M. Zhao, M. Ha, and J. Qiao, “Stability and admissibility analysis for zero-sum games under general value iteration formulation,” IEEE Trans. Neural Networks and Learning Systems, 2022.
[38]	M. Ha, D. Wang, and D. Liu, “Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee,” Neural Networks, vol. 144, pp. 176–186, 2021. doi: 10.1016/j.neunet.2021.08.025
[39]	A. Heydari, “Stability analysis of optimal adaptive control using value iteration with approximation errors,” IEEE Trans. Automatic Control, vol. 63, no. 9, pp. 3119–3126, Sept. 2018. doi: 10.1109/TAC.2018.2790260
[40]	M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 52, no. 6, pp. 3692–3703, Jun. 2022. doi: 10.1109/TSMC.2021.3071968
[41]	R. S. Sutton and A. G. Barto, “Eligibility traces,” in Reinforcement Learning: An Introduction, Cambridge, USA: pp.163–192, 1998.
[42]	T. Landelius, “Reinforcement learning and distributed local model synthesis,” Ph.D. dissertation, Linkoping Univ., Linkoping, Sweden, 1997.
[43]	D. P. Iracleous and A. T. Alexandridis, “A multi-task automatic generation control for power regulation,” Electric Power Systems Research, vol. 73, no. 3, pp. 275–285, Mar. 2005. doi: 10.1016/j.jpgr.2004.06.011

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (673) PDF downloads(79)

Highlights

Starting from the initial zero cost function, the properties of convergence and monotonicity are investigated under the multi-step heuristic dynamic programming (MsHDP) framework, in order to solve the discrete-time optimal learning control problem
The traditional stability criterion is extended. A novel stability condition is designed to determine the system stability using the current policy. The novel condition is not only applicable to traditional value iteration and policy iteration, but also to MsHDP, which obviously reflects the connection and difference between them
Based on the analyses of stability and convergence, we develop a novel integrated MsHDP algorithm, which can accelerate the whole phase of the learning process without using an initial admissible policy

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

doi: 10.1109/JAS.2023.123684

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content