Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications

Ding Wang; Ning Gao; Derong Liu; Jinna Li; Frank L. Lewis

doi:10.1109/JAS.2023.123843

Volume 11 Issue 1

Jan. 2024

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2024 > 11(1): 18-36

D. Wang, N. Gao, D. Liu, J. Li, and F. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843

Citation:

D. Wang, N. Gao, D. Liu, J. Li, and F. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843

Citation:

PDF( 1172 KB)

Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications

doi: 10.1109/JAS.2023.123843

Funds: This work was supported in part by the National Natural Science Foundation of China (62222301, 62073085, 62073158, 61890930-5, 62021003), the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5), and Beijing Natural Science Foundation (JQ19013)

More Information

Author Bio:
Ding Wang (Senior Member, IEEE) received the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, in 2012. He was an Associate Professor with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. He is currently a Full Professor with the Faculty of Information Technology, Beijing University of Technology. He has authored or co-authored over 150 journal and conference papers and five monographs. His current research interests include adaptive critic control with industrial applications, reinforcement learning, and intelligent systems. Dr. Wang was selected as a Clarivate Highly Cited Researcher and Elsevier Most Cited Chinese Researcher. He is a Member of IEEE/CAA Journal of Automatica Sinica Early Career Advisory Board. He currently or formerly serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Neural Networks, Engineering Applications of Artificial Intelligence, International Journal of Robust and Nonlinear Control, International Journal of Adaptive Control and Signal Processing, Neurocomputing, and Acta Automatica Sinica

Ning Gao received the M.E. from Beijing University of Technology in 2021. He is currently pursuing the Ph.D. degree in control science and engineering with Beijing University of Technology. His research interests include adaptive dynamic programming, reinforcement learning with industrial applications, and intelligent systems

Derong Liu (Fellow, IEEE) received the Ph.D. degree in electrical engineering from the University of Notre Dame in 1994. He was a Staff Fellow with General Motors Research and Development Center, from 1993 to 1995. He was an Assistant Professor with the Department of Electrical and Computer Engineering, Stevens Institute of Technology, from 1995 to 1999. He joined the University of Illinois Chicago in 1999, and became a Full Professor of electrical and computer engineering and of computer science in 2006. He was selected for the “100 Talents Program” by the Chinese Academy of Sciences in 2008, and he served as the Associate Director of the State Key Laboratory of Management and Control for Complex Systems at the Institute of Automation, from 2010 to 2016. He is currently a Chair Professor with the School of System Design and Intelligent Manufacturing, Southern University of Science and Technology. He has published 13 books. He is the Editor-in-Chief of Artificial Intelligence Review (Springer). He was the Editor-in-Chief of the IEEE Transactions on Neural Networks and Learning Systems from 2010 to 2015. He received the Faculty Early Career Development Award from the National Science Foundation in 1999, the University Scholar Award from University of Illinois from 2006 to 2009, the Overseas Outstanding Young Scholar Award from the National Natural Science Foundation of China in 2008, the Outstanding Achievement Award from Asia Pacific Neural Network Assembly in 2014, the INNS Gabor Award in 2018, the IEEE SMC Society Andrew P. Sage Best Transactions Paper Award in 2018, the IEEE TNNLS Outstanding Paper Award in 2018, the IEEE/CCA Journal Automatica Sinica Hsue-Shen Tsien Paper Award in 2018, and the IEEE CIS Neural Network Pioneer Award in 2022. He has been named a highly cited researcher consecutively from 2017 to 2022 by Clarivate. He is a Fellow of the IEEE, a Fellow of the International Neural Network Society, a Fellow of the International Association of Pattern Recognition, and a Member of Academia Europaea (the Academy of Europe)

Jinna Li (Senior Member, IEEE) received the Ph.D. degree from Northeastern University in 2009. She is a Full Professor at the School of Information and Control Engineering, Liaoning Petrochemical University. From April 2009 to April 2011, she was a post-doctor with the Lab of Industrial Control Networks and Systems, Shenyang Institute of Automation, Chinese Academy of Sciences. She was a Visiting Scholar with the Energy Research Institute, Nanyang Technological University, Singapore, the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University and the School of Electrical and Electronic Engineering, the University of Manchester, UK in Jun. 2014 to Jun. 2015, Sept. 2015 to Jun. 2016 and Jan. 2017 to Jul. 2017, respectively. Her current research interests include reinforcement learning, neural networks, and optimal operational control

Frank L. Lewis (Life Fellow, IEEE) obtained the bachelor degree in physics/EE and the MSEE at Rice University, the M.S. degree in aeronautical engineering from Univ. W. Florida, and the Ph.D. degree at Ga. Tech. Fellow, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow European Union Academy of Science, Fellow U.K. Institute of Measurement & Control. PE Texas, U.K. Chartered Engineer. Moncrief-O’Donnell Chair at the University of Texas at Arlington. UTA Charter Distinguished Scholar Professor, UTA Distinguished Teaching Professor.Lewis is ranked as number 23 in the world of all scientists in the world and 12 in the USA in Electronics and Electrical Engineering by Research.com. Ranked number 5 in the world in the subfield of Industrial Engineering and Automation according to a Stanford University Research Study in 2021. Recognized as a Top 1% Highly Top Cited Researcher by Clarivate Web of Science every year since 2019. Fellow, National Academy of Inventors. 86 511 google citations, h-index 130. He works in feedback control, intelligent systems, reinforcement learning, cooperative control systems, and nonlinear systems. He is author of 8 U.S. patents, numerous journal special issues, 500 journal papers, 20 books, including the textbooks Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, IEEE Computational Intelligence Society Neural Networks Pioneer Award, AIAA Intelligent Systems Award, AACC Ragazzini Award. He has received over $12M in 100 research grants from NSF, ARO, ONR, AFOSR, DARPA, and USA industry contracts. Helped win the US SBA Tibbets Award in 1996 as Director of the UTA Research Institute SBIR Program
Corresponding author: Derong Liu, e-mail: liudr@sustech.edu.cn; derong@uic.edu
Received Date: 2023-06-29
Accepted Date: 2023-08-19

Available Online: 2023-10-16

Abstract

Abstract

Reinforcement learning (RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming (ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively. Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks, showing how they promote ADP formulation significantly. Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has demonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
- Adaptive dynamic programming (ADP),
- advanced control,
- complex environment,
- data-driven control,
- event-triggered design,
- intelligent control,
- neural networks,
- nonlinear systems,
- optimal control,
- reinforcement learning (RL)

FullText(HTML)

References(160)

References

[1]	P. McCorduck and C. Cfe, Machines Who Think: A Personal Inquiry Into the History and Prospects of Artificial Intelligence. 2nd ed. New York, USA: CRC Press, 2004.
[2]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
[3]	K. Doya, H. Kimura, and M. Kawato, “Neural mechanisms of learning and control,” IEEE Control Syst. Mag., vol. 21, no. 4, pp. 42–54, Aug. 2001. doi: 10.1109/37.939943
[4]	D. Wang, M. Ha, and M. Zhao, “The intelligent critic framework for advanced optimal control,” Artif. Intell. Rev., vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
[5]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, USA: The MIT Press, 2018.
[6]	F. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, USA: Wiley, 2013.
[7]	L. Kong, W. He, C. Yang, and C. Sun, “Robust neurooptimal control for a robot via adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2584–2594, Jun. 2021. doi: 10.1109/TNNLS.2020.3006850
[8]	R. E. Bellman, Dynamic Programming. Princeton, USA: Princeton University Press, 1957.
[9]	F. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control. 3rd ed. Hoboken, USA: Wiley, 2012.
[10]	P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” Gen. Syst., vol. 22, pp. 25–38, Jan. 1977.
[11]	D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 1, pp. 142–160, Jan. 2021. doi: 10.1109/TSMC.2020.3042876
[12]	Z.-P. Jiang and Y. Jiang, “Robust adaptive dynamic programming for linear and nonlinear systems: An overview,” Eur. J. Control, vol. 19, no. 5, pp. 417–425, Sept. 2013. doi: 10.1016/j.ejcon.2013.05.017
[13]	D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Programming With Applications in Optimal Control. Cham, Switzerland: Springer, 2017.
[14]	M. Ha, D. Wang, and D. Liu, “Generalized value iteration for discounted optimal control with stability analysis,” Syst. Control Lett., vol. 147, p. 104847, Jan. 2021. doi: 10.1016/j.sysconle.2020.104847
[15]	D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 45, no. 12, pp. 1577–1591, Dec. 2015. doi: 10.1109/TSMC.2015.2417510
[16]	H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control: Algorithms and Stability. London, UK: Springer, 2013.
[17]	D. Wang, H. He, and D. Liu, “Adaptive critic nonlinear robust control: A survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017. doi: 10.1109/TCYB.2017.2712188
[18]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
[19]	F. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
[20]	A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst.,Man,Cybern.,Part B: Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
[21]	H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3331–3340, Oct. 2017. doi: 10.1109/TCYB.2016.2611613
[22]	X. Wang, D. Ding, X. Ge, and Q.-L. Han, “Supplementary control for quantized discrete-time nonlinear systems under goal representation heuristic dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., 2022. DOI: 10.1109/TNNLS.2022.3201521
[23]	D. Wang, M. Zhao, M. Ha, and J. Ren, “Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application,” Neural Netw., vol. 143, pp. 121–132, Nov. 2021. doi: 10.1016/j.neunet.2021.05.027
[24]	D. Wang, M. Zhao, and J. Qiao, “Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system,” Int. J. Robust Nonlinear Control, vol. 31, no. 14, pp. 6773–6787, Sept. 2021. doi: 10.1002/rnc.5639
[25]	J. Xu, J. Wang, J. Rao, Y. Zhong, and H. Wang, “Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function,” Int. J. Robust Nonlinear Control, vol. 32, no. 6, pp. 3408–3424, Apr. 2022. doi: 10.1002/rnc.5955
[26]	H. Li and D. Liu, “Optimal control for discrete-time affine non-linear systems using general value iteration,” IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, Dec. 2012. doi: 10.1049/iet-cta.2011.0783
[27]	Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
[28]	M. Ha, D. Wang, and D. Liu, “Offline and online adaptive critic control designs with stability guarantee through value iteration,” IEEE Trans. Cybern., vol. 52, no. 12, pp. 13262–13274, Dec. 2022. doi: 10.1109/TCYB.2021.3107801
[29]	D. Wang, M. Zhao, M. Ha, and J. Qiao, “Discounted near-optimal regulation of constrained nonlinear systems via generalized value iteration,” Int. J. Robust Nonlinear Control, vol. 31, no. 17, pp. 8481–8503, Nov. 2021. doi: 10.1002/rnc.5729
[30]	C. Kamanchi, R. B. Diddigi, and S. Bhatnagar, “Generalized second-order value iteration in Markov decision processes,” IEEE Trans. Autom. Control, vol. 67, no. 8, pp. 4241–4247, Aug. 2022. doi: 10.1109/TAC.2021.3112851
[31]	D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
[32]	Y. Zhu, D. Zhao, and H. He, “Invariant adaptive dynamic programming for discrete-time optimal control,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 3959–3971, Nov. 2020. doi: 10.1109/TSMC.2019.2911900
[33]	B. Luo, Y. Yang, H. N. Wu, and T. Huang, “Balancing value iteration and policy iteration for discrete-time control,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 3948–3958, Nov. 2020. doi: 10.1109/TSMC.2019.2898389
[34]	A. Heydari, “Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522–4527, Sept. 2018. doi: 10.1109/TNNLS.2017.2755501
[35]	T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, Sept. 2016. doi: 10.1016/j.automatica.2016.05.003
[36]	H. Zargarzadeh, T. Dierks, and S. Jagannathan, “Optimal control of nonlinear continuous-time systems in strict-feedback form,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2535–2549, Oct. 2015. doi: 10.1109/TNNLS.2015.2441712
[37]	C. Possieri and M. Sassano, “Data-driven policy iteration for nonlinear optimal control problems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 7365–7376, Oct. 2023. doi: 10.1109/TNNLS.2022.3142501
[38]	S. He, H. Fang, M. Zhang, F. Liu, and Z. Ding, “Adaptive optimal control for a class of nonlinear systems: The online policy iteration approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 2, pp. 549–558, Feb. 2020. doi: 10.1109/TNNLS.2019.2905715
[39]	Q. Wei, H. Li, X. Yang, and H. He, “Continuous-time distributed policy iteration for multicontroller nonlinear systems,” IEEE Trans. Cybern., vol. 51, no. 5, pp. 2372–2383, May 2021. doi: 10.1109/TCYB.2020.2979614
[40]	Q. Wei, Z. Liao, Z. Yang, B. Li, and D. Liu, “Continuous-time time-varying policy iteration,” IEEE Trans. Cybern., vol. 50, no. 12, pp. 4958–4971, Dec. 2020. doi: 10.1109/TCYB.2019.2926631
[41]	T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 7, pp. 2781–2790, Jul. 2022. doi: 10.1109/TNNLS.2020.3045087
[42]	R. Moghadam, P. Natarajan, and S. Jagannathan, “Online optimal adaptive control of partially uncertain nonlinear discrete-time systems using multilayer neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4840–4850, Sept. 2022. doi: 10.1109/TNNLS.2021.3061414
[43]	C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 48, no. 4, pp. 511–521, Apr. 2018. doi: 10.1109/TSMC.2016.2606479
[44]	J. Li, H. Modares, T. Chai, F. Lewis, and L. Xie, “Off-policy reinforcement learning for synchronization in multiagent graphical games,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2434–2445, Oct. 2017. doi: 10.1109/TNNLS.2016.2609500
[45]	Q. Wei, D. Liu, and Y. Xu, “Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach,” Soft Comput., vol. 20, no. 2, pp. 697–706, Feb. 2016. doi: 10.1007/s00500-014-1533-0
[46]	R. Song and L. Zhu, “Optimal fixed-point tracking control for discrete-time nonlinear systems via ADP,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 657–666, May 2019. doi: 10.1109/JAS.2019.1911453
[47]	Q. Lin, Q. Wei, and D. Liu, “A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm,” Int. J. Syst. Sci., vol. 48, no. 3, pp. 525–534, May 2017. doi: 10.1080/00207721.2016.1188177
[48]	B. Kiumarsi and F. Lewis, “Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 140–151, Jan. 2015. doi: 10.1109/TNNLS.2014.2358227
[49]	M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 6, pp. 3692–3703, Jun. 2022. doi: 10.1109/TSMC.2021.3071968
[50]	C. Li, J. Ding, F. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, p. 109687, Jul. 2021. doi: 10.1016/j.automatica.2021.109687
[51]	M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
[52]	W. Gao and Z.-P. Jiang, “Adaptive dynamic programming and adaptive optimal output regulation of linear systems,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4164–4169, Dec. 2016. doi: 10.1109/TAC.2016.2548662
[53]	C. Chen, H. Modares, K. Xie, F. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
[54]	Y. Fu, C. Hong, J. Fu, and T. Chai, “Approximate optimal tracking control of nondifferentiable signals for a class of continuous-time nonlinear systems,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 4441–4450, Jun. 2022. doi: 10.1109/TCYB.2020.3027344
[55]	S. Mukherjee, H. Bai, and A. Chakrabortty, “Model-based and model-free designs for an extended continuous-time LQR with exogenous inputs,” Syst. Control Lett., vol. 154, p. 104983, Aug. 2021. doi: 10.1016/j.sysconle.2021.104983
[56]	S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 5, pp. 1523–1536, May 2019. doi: 10.1109/TNNLS.2018.2870075
[57]	D. Wang, J. Ren, M. Ha, and J. Qiao, “System stability of learning-based linear optimal control with general discounted value iteration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 6504–6514, Sept. 2023. doi: 10.1109/TNNLS.2021.3137524
[58]	S. K. Jha and S. Bhasin, “Adaptive linear quadratic regulator for continuous-time systems with uncertain dynamics,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 833–841, May 2020. doi: 10.1109/JAS.2019.1911438
[59]	S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,” IEEE Trans. Cybern., vol. 50, no. 11, pp. 4670–4679, Nov. 2020. doi: 10.1109/TCYB.2018.2886735
[60]	Y. Jiang, J. Fan, T. Chai, F. Lewis, and J. Li, “Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 10, pp. 4607–4620, Oct. 2018. doi: 10.1109/TNNLS.2017.2771459
[61]	L. Dong, X. Zhong, C. Sun, and H. He, “Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 7, pp. 1594–1605, Jul. 2017. doi: 10.1109/TNNLS.2016.2541020
[62]	M. Ha, D. Wang, and D. Liu, “Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 9, pp. 3158–3168, Sept. 2020. doi: 10.1109/TSMC.2018.2868510
[63]	F. Zhao, W. Gao, T. Liu, and Z.-P. Jiang, “Adaptive optimal output regulation of linear discrete-time systems based on event-triggered output-feedback,” Automatica, vol. 137, p. 110103, Mar. 2022. doi: 10.1016/j.automatica.2021.110103
[64]	Q. Zhao, J. Si, and J. Sun, “Online reinforcement learning control by direct heuristic dynamic programming: From time-driven to event-driven,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 8, pp. 4139–4144, Aug. 2022. doi: 10.1109/TNNLS.2021.3053037
[65]	J. Lu, Q. Wei, Y. Liu, T. Zhou, and F.-Y. Wang, “Event-triggered optimal parallel tracking control for discrete-time nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 6, pp. 3772–3784, Jun. 2022. doi: 10.1109/TSMC.2021.3073429
[66]	Q. Wei, J. Lu, T. Zhou, X. Cheng, and F.-Y. Wang, “Event-triggered near-optimal control of discrete-time constrained nonlinear systems with application to a boiler-turbine system,” IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 3926–3935, Jun. 2022. doi: 10.1109/TII.2021.3116084
[67]	D. Wang, L. Hu, M. Zhao, and J. Qiao, “Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 3926–3935, Sept. 2023. doi: 10.1109/TNNLS.2021.3135405
[68]	B. Luo, Y. Yang, D. Liu, and H.-N. Wu, “Event-triggered optimal control with performance guarantees using adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 1, pp. 76–88, Jan. 2020. doi: 10.1109/TNNLS.2019.2899594
[69]	H. Zhang, J. Zhang, Y. Cai, S. Sun, and J. Sun, “Leader-following consensus for a class of nonlinear multiagent systems under event-triggered and edge-event triggered mechanisms,” IEEE Trans. Cybern., vol. 52, no. 8, pp. 7643–7654, Aug. 2022. doi: 10.1109/TCYB.2020.3035907
[70]	X. Huo, H. R. Karimi, X. Zhao, B. Wang, and G. Zong, “Adaptive-critic design for decentralized event-triggered control of constrained nonlinear interconnected systems within an identifier-critic framework,” IEEE Trans. Cybern., vol. 52, no. 8, pp. 7478–7491, Aug. 2022. doi: 10.1109/TCYB.2020.3037321
[71]	V. Narayanan, H. Modares, and S. Jagannathan, “Event-triggered control of input-affine nonlinear interconnected systems using multiplayer game,” Int. J. Robust Nonlinear Control, vol. 31, no. 3, pp. 950–970, Oct. 2021. doi: 10.1002/rnc.5321
[72]	X. Yang and Q. Wei, “Adaptive critic learning for constrained optimal event-triggered control with discounted cost,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 91–104, Jan. 2021. doi: 10.1109/TNNLS.2020.2976787
[73]	M. Bahraini, M. Zanon, A. Colombo, and P. Falcone, “Optimal control design for perturbed constrained networked control systems,” IEEE Control Syst. Lett., vol. 5, no. 2, pp. 553–558, Apr. 2021. doi: 10.1109/LCSYS.2020.3004204
[74]	M. H. Mamduhi, D. Maity, S. Hirche, J. S. Baras, and K. H. Johansson, “Delay-sensitive joint optimal control and resource management in multiloop networked control systems,” IEEE Trans. Control Netw. Syst., vol. 8, no. 3, pp. 1093–1106, Sept. 2021. doi: 10.1109/TCNS.2021.3074244
[75]	Y. Jiang, B. Kiumarsi, J. Fan, T. Chai, J. Li, and F. Lewis, “Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3147–3156, Jul. 2020. doi: 10.1109/TCYB.2018.2890046
[76]	D. Wang, D. Liu, H. Li, B. Luo, and H. Ma, “An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 46, no. 5, pp. 713–717, May 2016. doi: 10.1109/TSMC.2015.2466191
[77]	J. Li, J. Ding, T. Chai, F. Lewis, and S. Jagannathan, “Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 1, pp. 270–280, Jan. 2022. doi: 10.1109/TNNLS.2020.3027653
[78]	N. S. Tripathy, I. N. Kar, and K. Paul, “Suboptimal robust stabilization of discrete-time mismatched nonlinear system,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 352–359, Jan. 2018. doi: 10.1109/JAS.2017.7510676
[79]	D. M. Adhyaru, I. N. Kar, and M. Gopal, “Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution,” IET Control Theory Appl., vol. 3, no. 9, pp. 1183–1195, Sept. 2009. doi: 10.1049/iet-cta.2008.0288
[80]	D. Wang, D. Liu, Q. Zhang, and D. Zhao, “Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 46, no. 11, pp. 1544–1555, Nov. 2016. doi: 10.1109/TSMC.2015.2492941
[81]	D. Wang and X. Xu, “A data-based neural policy learning strategy towards robust tracking control design for uncertain dynamic systems,” Int. J. Syst. Sci., vol. 53, no. 8, pp. 1719–1732, Jan. 2022. doi: 10.1080/00207721.2021.2023685
[82]	X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Ind. Electron., vol. 65, no. 7, pp. 5722–5732, Jul. 2018. doi: 10.1109/TIE.2017.2782205
[83]	D. Wang, “Robust policy learning control of nonlinear plants with case studies for a power system application,” IEEE Trans. Ind. Inf., vol. 16, no. 3, pp. 1733–1741, Mar. 2020. doi: 10.1109/TII.2019.2925632
[84]	D. Wang, “Intelligent critic control with robustness guarantee of disturbed nonlinear plants,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2740–2748, Jun. 2020. doi: 10.1109/TCYB.2019.2903117
[85]	C. Mu, Y. Zhang, Z. Gao, and C. Sun, “ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4056–4067, Nov. 2020. doi: 10.1109/TSMC.2019.2895692
[86]	B. Pang, T. Bian, and Z. P. Jiang, “Robust policy iteration for continuous-time linear quadratic regulation,” IEEE Trans. Autom. Control, vol. 67, no. 1, pp. 504–511, Jan. 2022. doi: 10.1109/TAC.2021.3085510
[87]	J. Hou, D. Wang, D. Liu, and Y. Zhang, “Model-free H_∞ optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4097–4108, Nov. 2020. doi: 10.1109/TSMC.2018.2863708
[88]	Y. Liu, H. Zhang, R. Yu, and Z. Xing, “H_∞ tracking control of discrete-time system with delays via data-based adaptive dynamic programming,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4078–4085, Nov. 2020. doi: 10.1109/TSMC.2019.2946397
[89]	D. Wang, L. Hu, M. Zhao, and J. Qiao, “Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 53, no. 3, pp. 1584–1595, Mar. 2023. doi: 10.1109/TSMC.2022.3201671
[90]	X. Zhong, H. He, D. Wang, and Z. Ni, “Model-free adaptive control for unknown nonlinear zero-sum differential game,” IEEE Trans. Cybern., vol. 48, no. 5, pp. 1633–1646, May 2018. doi: 10.1109/TCYB.2017.2712617
[91]	Y. Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 714–725, Mar. 2017. doi: 10.1109/TNNLS.2016.2561300
[92]	L. N. Tan, “Distributed H_∞ optimal tracking control for strict-feedback nonlinear large-scale systems with disturbances and saturating actuators,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4719–4731, Nov. 2020. doi: 10.1109/TSMC.2018.2861470
[93]	R. Song, Q. Wei, H. Zhang, and F. Lewis, “ Discrete-time non-zero-sum games with completely unknown dynamics,” IEEE Trans. Cybern., vol. 51, no. 6, pp. 2929–2943, Jun. 2021. doi: 10.1109/TCYB.2019.2957406
[94]	K. Raghavan, J. Sarangapani, and V. A. Samaranayake, “A game theoretic approach for addressing domain-shift in big-data,” IEEE Trans. Big Data, vol. 8, no. 6, pp. 1610–1621, Dec. 2022.
[95]	J. Li, Z. Xiao, J. Fan, T. Chai, and F. Lewis, “Off-policy Q-learning: Solving nash equilibrium of multi-player games with network-induced delay and unmeasured state,” Automatica, vol. 136, p. 110076, Feb. 2022. doi: 10.1016/j.automatica.2021.110076
[96]	Z. Peng, R. Luo, J. Hu, K. Shi, S. K. Nguang, and B. K. Ghosh, “Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 8, pp. 4043–4055, Aug. 2022. doi: 10.1109/TNNLS.2021.3055761
[97]	J. Li, J. Ding, T. Chai, and F. Lewis, “Nonzero-sum game reinforcement learning for performance optimization in large-scale industrial processes,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 4132–4145, Sept. 2020. doi: 10.1109/TCYB.2019.2950262
[98]	Y. Lv and X. Ren, “Approximate Nash solutions for multiplayer mixed-zero-sum game with reinforcement learning,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 49, no. 12, pp. 2739–2750, Dec. 2019. doi: 10.1109/TSMC.2018.2861826
[99]	H. Zhang, H. Su, K. Zhang, and Y. Luo, “Event-triggered adaptive dynamic programming for non-zero-sum games of unknown nonlinear systems via generalized fuzzy hyperbolic models,” IEEE Trans. Fuzzy Syst., vol. 27, no. 11, pp. 2202–2214, Nov. 2019. doi: 10.1109/TFUZZ.2019.2896544
[100]	X. Yang, Y. Zhou, N. Dong, and Q. Wei, “Adaptive critics for decentralized stabilization of constrained-input nonlinear interconnected systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 7, pp. 4187–4199, Jul. 2022. doi: 10.1109/TSMC.2021.3089944
[101]	X. Yang, Z. Zeng, and Z. Gao, “Decentralized neurocontroller design with critic learning for nonlinear-interconnected systems,” IEEE Trans. Cybern., vol. 52, no. 11, pp. 11672–11685, Nov. 2022. doi: 10.1109/TCYB.2021.3085883
[102]	S. Tong, K. Sun, and S. Sui, “Observer-based adaptive fuzzy decentralized optimal control design for strict-feedback nonlinear large-scale systems,” IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 569–584, Apr. 2018. doi: 10.1109/TFUZZ.2017.2686373
[103]	J. Song, L. Y. Huang, H. R. Karimi, Y. Niu, and J. Zhou, “ADP-based security decentralized sliding mode control for partially unknown large-scale systems under injection attacks,” IEEE Trans. Circuits Syst. I: Regul. Pap., vol. 67, no. 12, pp. 5290–5301, Dec. 2020. doi: 10.1109/TCSI.2020.3014253
[104]	W. Wang, X. Chen, H. Fu, and M. Wu, “Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4123–4134, Nov. 2020. doi: 10.1109/TSMC.2018.2883801
[105]	H. Fu, X. Chen, and M. Wu, “Distributed optimal observer design of networked systems via adaptive critic design,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 11, pp. 6976–6985, Nov. 2021. doi: 10.1109/TSMC.2019.2962088
[106]	Z. Zuo, B. Tian, M. Defoort, and Z. Ding, “Fixed-time consensus tracking for multiagent systems with high-order integrator dynamics,” IEEE Trans. Autom. Control, vol. 63, no. 2, pp. 563–570, Feb. 2018. doi: 10.1109/TAC.2017.2729502
[107]	Y. Jiang, J. Fan, W. Gao, T. Chai, and F. Lewis, “Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems,” Automatica, vol. 121, p. 109149, Nov. 2020. doi: 10.1016/j.automatica.2020.109149
[108]	A. Sargolzaei, B. C. Allen, C. D. Crane, and W. E. Dixon, “Lyapunov-based control of a nonlinear multiagent system with a time-varying input delay under false-data-injection attacks,” IEEE Trans. Ind. Inf., vol. 18, no. 4, pp. 2693–2703, Apr. 2022. doi: 10.1109/TII.2021.3106009
[109]	J. Huang, Z. Zhang, F. Cai, and Y. Chen, “Optimized formation control for multi-agent systems based on adaptive dynamic programming without persistence of excitation,” IEEE Control Syst. Lett., vol. 6, pp. 1412–1417, Jul. 2022. doi: 10.1109/LCSYS.2021.3098964
[110]	W. Gao, Z. Jiang, F. Lewis, and Y. Wang, “Leader-to-formation stability of multiagent systems: An adaptive optimal control approach,” IEEE Trans. Autom. Control, vol. 63, no. 10, pp. 3581–3587, Oct. 2018. doi: 10.1109/TAC.2018.2799526
[111]	Y. Wang and S. Boyd, “Fast model predictive control using online optimization,” IEEE Trans. Control Syst. Technol., vol. 18, no. 2, pp. 267–278, Mar. 2010. doi: 10.1109/TCST.2009.2017934
[112]	D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” Eur. J. Control, vol. 11, no. 4–5, pp. 310–334, Dec. 2005. doi: 10.3166/ejc.11.310-334
[113]	L. Beckenbach, P. Osinenko, and S. Streif, “Addressing infinite-horizon optimization in MPC via Q-learning,” IFAC-PapersOnLine, vol. 51, no. 20, pp. 60–65, Jan. 2018. doi: 10.1016/j.ifacol.2018.10.175
[114]	C. Hu, L. Zhao, and G. Qu, “Event-triggered model predictive adaptive dynamic programming for road intersection path planning of unmanned ground vehicle,” IEEE Trans. Veh. Technol., vol. 70, no. 11, pp. 11228–11243, Nov. 2021. doi: 10.1109/TVT.2021.3111692
[115]	T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration,” in Proc. IEEE Conf. Decision and Control, Miami, USA, 2018, pp. 6059–6066.
[116]	D. Görges, “Relations between model predictive control and reinforcement learning,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 4920–4928, Jul. 2017. doi: 10.1016/j.ifacol.2017.08.747
[117]	E. Bøhn, S. Gros, S. Moe, and T. A. Johansen, “Optimization of the model predictive control update interval using reinforcement learning,” IFAC-PapersOnLine, vol. 54, no. 14, pp. 257–262, Sept. 2021. doi: 10.1016/j.ifacol.2021.10.362
[118]	M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Trans. Autom. Control, vol. 66, no. 8, pp. 3638–3652, Aug. 2021. doi: 10.1109/TAC.2020.3024161
[119]	X. Yang, H. Zhang, Z. Wang, H. Yan, and C. Zhang, “Data-based predictive control via multistep policy gradient reinforcement learning,” IEEE Trans. Cybern., vol. 53, no. 5, pp. 2818–2828, May 2023. doi: 10.1109/TCYB.2021.3121078
[120]	V. Kompella, R. Capobianco, S. Jong, J. Browne, S. Fox, L. Meyers, P. Wurman, and P. Stone, “Reinforcement learning for optimization of COVID-19 mitigation policies,” arXiv preprint arXiv: 2010.10560, 2020.
[121]	D. Wang, M.-M. Zhao, M.-M. Ha, and J.-F. Qiao, “Intelligent optimal tracking with application verifications via discounted generalized value iteration,” Acta Autom. Sinica, vol. 48, no. 1, pp. 182–193, Jan. 2022.
[122]	J. M. Lee and J. H. Lee, “Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes,” Automatica, vol. 41, no. 7, pp. 1281–1288, Jul. 2005. doi: 10.1016/j.automatica.2005.02.006
[123]	B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 9303–9311.
[124]	J. Lu, Q. Wei, and F.-Y. Wang, “Parallel control for optimal tracking via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1662–1674, Nov. 2020. doi: 10.1109/JAS.2020.1003426
[125]	C. J. C. H. Warkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3, pp. 279–292, May 1992.
[126]	X. Li, L. Dong, and C. Sun, “Data-based optimal tracking of autonomous nonlinear switching systems,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227–238, Jan. 2021. doi: 10.1109/JAS.2020.1003486
[127]	L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732
[128]	B. Pang, Z.-P. Jiang, and I. Mareels, “Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems,” Automatica, vol. 118, p. 109035, Aug. 2020. doi: 10.1016/j.automatica.2020.109035
[129]	R. Yang, D. Wang, and J. Qiao, “Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control,” IEEE Trans. Ind. Inf., vol. 18, no. 5, pp. 3150–3158, May 2022. doi: 10.1109/TII.2021.3106402
[130]	Q. Zhang and D. Zhao, “Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics,” IEEE Trans. Cybern., vol. 49, no. 8, pp. 2874–2885, Aug. 2019. doi: 10.1109/TCYB.2018.2830820
[131]	M. Pieters and M. A. Wiering, “Q-learning with experience replay in a dynamic environment,” in Proc. 2016 IEEE Symp. Series on Computational Intelligence, Athens, Greece, 2016, pp. 1–8.
[132]	W.-C. Jiang, K. S. Hwang, and J.-L. Lin, “An experience replay method based on tree structure for reinforcement learning,” IEEE Trans. Emerg. Top. Comput., vol. 9, no. 2, pp. 972–982, Apr.–Jun. 2021. doi: 10.1109/TETC.2018.2890682
[133]	B. Luo, Y. Yang, and D. Liu, “Adaptive Q-learning for data-based optimal output regulation with experience replay,” IEEE Trans. Cybern., vol. 48, no. 12, pp. 3337–3348, Dec. 2018. doi: 10.1109/TCYB.2018.2821369
[134]	S. Al-Dabooni and D. C. Wunsch, “Online model-free n-step HDP with stability analysis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1255–1269, Apr. 2020. doi: 10.1109/TNNLS.2019.2919614
[135]	F. J. Pineda, “Mean-field theory for batched TD(λ),” Neural Comput., vol. 9, no. 7, pp. 1403–1419, Oct. 1997. doi: 10.1162/neco.1997.9.7.1403
[136]	R. Yousefian and S. Kamalasadan, “Design and real-time implementation of optimal power system wide-area system-centric controller based on temporal difference learning,” IEEE Trans. Ind. Appl., vol. 52, no. 1, pp. 395–406, Jan.–Feb. 2016. doi: 10.1109/TIA.2015.2466622
[137]	S. Al-Dabooni and D. Wunsch, “The boundedness conditions for model-free HDP(λ),” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1928–1942, Jul. 2019. doi: 10.1109/TNNLS.2018.2875870
[138]	J. Ye, Y. Bian, B. Xu, Z. Qin, and M. Hu, “Online optimal control of discrete-time systems based on globalized dual heuristic programming with eligibility traces,” in Proc. 3rd Int. Conf. Industrial Artificial Intelligence, Shenyang, China, 2021, pp. 1–6.
[139]	N. Ab Azar, A. Shahmansoorian, and M. Davoudi, “From inverse optimal control to inverse reinforcement learning: A historical review,” Annu. Rev. Control, vol. 50, pp. 119–138, Jun. 2020. doi: 10.1016/j.arcontrol.2020.06.001
[140]	S. Adams, T. Cody, and P. A. Beling, “A survey of inverse reinforcement learning,” Artif. Intell. Rev., vol. 55, no. 6, pp. 4307–4346, Feb. 2022. doi: 10.1007/s10462-021-10108-x
[141]	B. Lian, W. Xue, F. Lewis, and T. Chai, “Inverse reinforcement learning for adversarial apprentice games,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 4596–4609, Aug. 2023. doi: 10.1109/TNNLS.2021.3114612
[142]	Y.-C. Bo and X. Zhang, “Online adaptive dynamic programming based on echo state networks for dissolved oxygen control,” Appl. Soft Comput. J., vol. 62, pp. 830–839, Jan. 2018. doi: 10.1016/j.asoc.2017.09.015
[143]	J. Qiao, L. Wang, and H. Han, “Optimal control for wastewater treatment process based on ESN neural network,” CAAI Trans. Intell. Syst., vol. 10, no. 6, pp. 831–837, Dec. 2015.
[144]	F. Hernandez-del-Olmo, E. Gaudioso, and A. Nevado, “Autonomous adaptive and active tuning up of the dissolved oxygen setpoint in a wastewater treatment plant using reinforcement learning,” IEEE Trans. Syst.,Man,Cybern.,Part C (Appl. Rev.), vol. 42, no. 5, pp. 768–774, Sept. 2012. doi: 10.1109/TSMCC.2011.2162401
[145]	D. Liu, Y. Xu, and X. Liu, “Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 36–46, Jan. 2018. doi: 10.1109/JAS.2017.7510739
[146]	Y. Zhu, D. Zhao, X. Li, and D. Wang, “Control-limited adaptive dynamic programming for multi-battery energy storage systems,” IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 4235–4244, Jul. 2019. doi: 10.1109/TSG.2018.2854300
[147]	Z. Wang, Y. Yu, W. Gao, M. Davari, and C. Deng, “Adaptive, optimal, virtual synchronous generator control of three-phase grid-connected inverters under different grid conditions — An adaptive dynamic programming approach,” IEEE Trans. Ind. Inf., vol. 18, no. 11, pp. 7388–7399, Nov. 2022. doi: 10.1109/TII.2021.3138893
[148]	D. Wang, H. He, C. Mu, and D. Liu, “Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system,” IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4935–4944, Jun. 2017. doi: 10.1109/TIE.2017.2674633
[149]	D. Liu, S. Baldi, W. Yu, and G. Chen, “On distributed implementation of switch-based adaptive dynamic programming,” IEEE Trans. Cybern., vol. 52, no. 7, pp. 7218–7224, Jul. 2022. doi: 10.1109/TCYB.2020.3029825
[150]	D. Liu, W. Yu, S. Baldi, J. Cao, and W. Huang, “A switching-based adaptive dynamic programming method to optimal traffic signaling,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4160–4170, Nov. 2020. doi: 10.1109/TSMC.2019.2930138
[151]	Y. Wen, J. Si, A. Brandt, X. Gao, and H. Huang, “Online reinforcement learning control for the personalization of a robotic knee prosthesis,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2346–2356, Jun. 2020. doi: 10.1109/TCYB.2019.2890974
[152]	X. Han, Z. Zheng, L. Liu, B. Wang, Z. Cheng, H. Fan, and Y. Wang, “Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles,” Aerosp. Sci. Technol., vol. 106, p. 106233, Nov. 2020. doi: 10.1016/j.ast.2020.106233
[153]	S. Zhao, J. Wang, H. Xu, and B. Wang, “Composite observer-based optimal attitude-tracking control with reinforcement learning for hypersonic vehicles,” IEEE Trans. Cybern., vol. 53, no. 2, pp. 913–926, Feb. 2023. doi: 10.1109/TCYB.2022.3192871
[154]	Q. Wei, T. Li, and D. Liu, “Learning control for air conditioning systems via human expressions,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7662–7671, Aug. 2021. doi: 10.1109/TIE.2020.3001849
[155]	A. H. Hosseinloo, A. Ryzhov, A. Bischi, H. Ouerdane, K. Turitsyn, and M. A. Dahleh, “Data-driven control of micro-climate in buildings: An event-triggered reinforcement learning approach,” Appl. Energy, vol. 277, p. 115451, Nov. 2020. doi: 10.1016/j.apenergy.2020.115451
[156]	T. T. Nguyen and V. J. Reddi, “Deep reinforcement learning for cyber security,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 3779–3795, Aug. 2023. doi: 10.1109/TNNLS.2021.3121870
[157]	X. Wang, D. Ding, X. Ge, and Q.-L. Han, “Neural-network-based control for discrete-time nonlinear systems with denial-of-service attack: The adaptive event-triggered case,” Int. J. Robust Nonlinear Control, vol. 32, no. 5, pp. 2760–2779, Mar. 2022. doi: 10.1002/rnc.5831
[158]	X. Wang, D. Ding, H. Dong, and X.-M. Zhang, “Neural-network-based control for discrete-time nonlinear systems with input saturation under stochastic communication protocol,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 4, pp. 766–778, Apr. 2021. doi: 10.1109/JAS.2021.1003922
[159]	R. Liu, F. Hao, and H. Yu, “Optimal SINR-based DoS attack scheduling for remote state estimation via adaptive dynamic programming approach,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 12, pp. 7622–7632, Dec. 2021. doi: 10.1109/TSMC.2020.2981478
[160]	L. Zhang, Y. Chen, and M. Li, “ADP-based remote secure control for networked control systems under unknown nonlinear attacks in sensors and actuators,” IEEE Trans. Ind. Inf., vol. 18, no. 9, pp. 6003–6014, Sept. 2022. doi: 10.1109/TII.2021.3137816

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views (3501) PDF downloads(979)

Highlights

This paper provides a comprehensive survey of recent developments in adaptive dynamic programming / reinforcement learning (ADP/RL) and its applications in various advanced control fields. Extensive work is summarized for discrete-time and continuous-time systems, respectively
The work on the integration of ADP/RL with methods and problems such as event-triggered control, robust stabilization, gaming design, distributed system design and model predictive control is discussed. The ADP architecture is also revisited from a data-driven and RL perspective, demonstrating the broad applicability of ADP
Several typical control applications of RL and ADP are summarized, especially in the areas of wastewater treatment processes and power systems. In addition, a general prospect of future research is given for the combination of ADP with approximation error analysis, high-dimensional system control, advanced networking techniques and communication protocols, brain-like intelligence and parallel control

Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications

doi: 10.1109/JAS.2023.123843

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content