Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control

Mingming Ha; Ding Wang; Derong Liu

doi:10.1109/JAS.2022.105692

Volume 9 Issue 7

Jul. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(7): 1262-1272

M. M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692

Citation:

M. M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692

Citation:

PDF( 1666 KB)

Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control

doi: 10.1109/JAS.2022.105692

Mingming Ha^1
,,
Ding Wang^{2
,
,},
Derong Liu^3
,

1.
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
2.
Faculty of Information Technology, the Beijing Key Laboratory of Computational Intelligence and Intelligent System, the Beijing Laboratory of Smart Environmental Protection, and the Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China
3.
Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago IL 60607 USA

Funds: This work was supported in part by Beijing Natural Science Foundation (JQ19013), the National Key Research and Development Program of China (2021ZD0112302), and the National Natural Science Foundation of China (61773373)

More Information

Author Bio:
Mingming Ha received the B.E. and M.E. degrees from the School of Automation and Electrical Engineering, University of Science and Technology Beijing, in 2016 and 2019, respectively. He is pursuing the Ph.D. degree in control science and engineering from University of Science and Technology Beijing. His research interests include optimal control, adaptive dynamic programming, and reinforcement learning

Ding Wang (Senior Member, IEEE) received the Ph.D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences, in 2012. He was an Associate Professor with the State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences. He is currently a Full Professor with the Faculty of Information Technology, Beijing University of Technology. He has authored or co-authored over 120 journal and conference papers and four monographs. His research interests include adaptive critic control with industrial applications, reinforcement learning, and intelligent systems. Dr. Wang was successively selected as a Clarivate Highly Cited Researcher in 2020 and 2021. He is a Member of IEEE/CAA Journal of Automatica Sinica Early Career Advisory Board. He currently or formerly serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Neural Networks, International Journal of Robust and Nonlinear Control, Neurocomputing, and Acta Automatica Sinica

Derong Liu (Fellow, IEEE) received the Ph.D. degree in electrical engineering from the University of Notre Dame, USA, in 1994. He became a Full Professor of electrical and computer engineering and computer science at the University of Illinois at Chicago in 2006. He served as the Associate Director of the State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, from 2010 to 2016. He has published 13 books. He received the International Neural Network Society’s Gabor Award in 2018 and the IEEE CIS Neural Network Pioneer Award in 2022. He has been named a highly cited researcher consecutively for five years from 2017 to 2021 by Clarivate. He was the Editor-in-Chief of the IEEE Transactions on Neural Networks and Learning Systems from 2010 to 2015. He is the Editor-in-Chief of Artificial Intelligence Review (Springer). He is a Fellow of the IEEE, a Fellow of the International Neural Network Society, a Fellow of the International Association of Pattern Recognition, and a Member of Academia Europaea (the Academy of Europe)
Corresponding author: Ding Wang, e-mail: dingwang@bjut.edu.cn
Received Date: 2022-01-27
Accepted Date: 2022-04-07

Available Online: 2022-05-18

Abstract

Abstract

The core task of tracking control is to make the controlled plant track a desired trajectory. The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases. In this paper, a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem. Unlike the regulator problem, the iterative value function of tracking control problem cannot be regarded as a Lyapunov function. A novel stability analysis method is developed to guarantee that the tracking error converges to zero. The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated. Finally, the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
- Adaptive critic design,
- adaptive dynamic programming (ADP),
- approximate dynamic programming,
- discrete-time nonlinear systems,
- reinforcement learning,
- stability analysis,
- tracking control,
- value iteration (VI)

FullText(HTML)

References(47)

References

[1]	D. Liu, Q. L. Wei, D. Wang, X. Yang, and H. L. Li, Adaptive Dynamic Programming with Applications in Optimal Control. Cham, Germany: Springer, 2017.
[2]	D. Liu and Q. L. Wei, “Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 43, no. 2, pp. 779–789, Apr. 2013. doi: 10.1109/TSMCB.2012.2216523
[3]	D. Wang, J. F. Qiao, and L. Cheng, “An approximate neuro-optimal solution of discounted guaranteed cost control design,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 77–86, Jan. 2022. doi: 10.1109/TCYB.2020.2977318
[4]	Q. L. Wei and D. Liu, “A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176–1190, Oct. 2014. doi: 10.1109/TASE.2013.2280974
[5]	M. M. Ha, D. Wang, and D. Liu, “Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 9, pp. 3158–3168, Sept. 2020. doi: 10.1109/TSMC.2018.2868510
[6]	D. Wang, M. M. Ha, and M. M. Zhao, “The intelligent critic framework for advanced optimal control,” Artif. Intell. Rev., vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
[7]	M. M. Ha, D. Wang, and D. Liu, “Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems,” Inf. Sci., vol. 519, pp. 110–123, May 2020. doi: 10.1016/j.ins.2020.01.020
[8]	D. Wang, H. B. He, and D. Liu, “Adaptive critic nonlinear robust control: A survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017. doi: 10.1109/TCYB.2017.2712188
[9]	Q. L. Wei, D. Liu, Y. Liu, and R. Z. Song, “Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 168–176, Apr. 2017. doi: 10.1109/JAS.2016.7510262
[10]	D. Liu, Y. C. Xu, Q. L. Wei, and X. L. Liu, “Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 36–46, Jan. 2018. doi: 10.1109/JAS.2017.7510739
[11]	A. Heydari and S. N. Balakrishnan, “Adaptive critic-based solution to an orbital rendezvous problem,” J. Guid.,Control,Dyn., vol. 37, no. 1, pp. 344–350, Jan. 2014. doi: 10.2514/1.60553
[12]	A. Heydari, “Theoretical and numerical analysis of approximate dynamic programming with approximation errors,” J. Guid.,Control,Dyn., vol. 39, no. 2, pp. 301–311, Feb. 2016. doi: 10.2514/1.G001154
[13]	D. Wang, M. M. Ha, and J. F. Qiao, “Data-driven iterative adaptive critic control toward an urban wastewater treatment plant,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7362–7369, Aug. 2021. doi: 10.1109/TIE.2020.3001840
[14]	X. Han, Z. Z. Zheng, L. Liu, B. Wang, Z. T. Cheng, H. J. Fan, and Y. J. Wang, “Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles,” Aerosp. Sci. Technol., vol. 106, p. 106233, Nov. 2020.
[15]	F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, Jan. 2009. doi: 10.1109/MCAS.2009.933854
[16]	F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
[17]	H. Li and D. Liu, “Optimal control for discrete-time affine non-linear systems using general value iteration,” IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, Dec. 2012. doi: 10.1049/iet-cta.2011.0783
[18]	Q. L. Wei, D. Liu, and H. Q. Lin, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
[19]	D. Wang and X. N. Zhong, “Advanced policy learning near-optimal regulation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 743–749, May 2019. doi: 10.1109/JAS.2019.1911489
[20]	A. Heydari, “Stability analysis of optimal adaptive control using value iteration with approximation errors,” IEEE Trans. Autom. Control, vol. 63, no. 9, pp. 3119–3126, Sept. 2018. doi: 10.1109/TAC.2018.2790260
[21]	D. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syet., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
[22]	D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syet., vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
[23]	D. Liu, D. Wang, and H. L. Li, “Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418–428, Feb. 2014. doi: 10.1109/TNNLS.2013.2280013
[24]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst.,Man,Cybern. Part B Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
[25]	D. Liu, X. Yang, D. Wang, and Q. L. Wei, “Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints,” IEEE Trans. Cybern., vol. 45, no. 7, pp. 1372–1385, Jul. 2015. doi: 10.1109/TCYB.2015.2417170
[26]	D. Liu, S. Xue, B. Zhao, B. Luo, and Q. L. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 1, pp. 142–160, Jan. 2021. doi: 10.1109/TSMC.2020.3042876
[27]	B. Lincoln and A. Rantzer, “Relaxing dynamic programming,” IEEE Trans. Autom. Control, vol. 51, no. 8, pp. 1249–1260, Aug. 2006. doi: 10.1109/TAC.2006.878720
[28]	A. Heydari, “Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522–4527, Sept. 2018. doi: 10.1109/TNNLS.2017.2755501
[29]	D. Wang, D. Liu, Q. L. Wei, D. B. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, p. 1825–1832, Aug. 2012.
[30]	M. M. Ha, D. Wang, and D. Liu. “Generalized value iteration for discounted optimal control with stability analysis,” Syst. Control Lett., vol. 147, p. 104847, Jan. 2021.
[31]	D. Wang, M. M. Ha, and J. F. Qiao, “Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation,” IEEE Trans. Autom. Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. doi: 10.1109/TAC.2019.2926167
[32]	H. G. Zhang, Q. L. Wei, and Y. H. Luo, “A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm,” IEEE Trans. Syst. Man,Cybern. Part B Cybern., vol. 38, no. 4, pp. 937–942, Aug. 2008. doi: 10.1109/TSMCB.2008.920269
[33]	B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M. B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, Apr. 2014. doi: 10.1016/j.automatica.2014.02.015
[34]	B. Kiumarsi and F. L. Lewis, “Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 140–151, Jan. 2015. doi: 10.1109/TNNLS.2014.2358227
[35]	Y. Cao, Y. D. Song, and C. Y. Wen, “Practical tracking control of perturbed uncertain nonaffine systems with full state constraints,” Automatica, vol. 110, p. 108608, Dec. 2019.
[36]	M. M. Ha, D. Wang, and D. Liu, “Data-based nonaffine optimal tracking control using iterative DHP approach,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 4246–4251, Jul. 2020. doi: 10.1016/j.ifacol.2020.12.2473
[37]	M. M. Ha, D. Wang, and D. Liu, “Value-iteration-based neuro-optimal tracking control for affine systems with completely unknown dynamics,” in Proc. 39th Chinese Control Conf., Shenyang, China, 2020, pp. 1951–1956.
[38]	D. Wang, D. Liu, and Q. L. Wei, “Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach,” Neurocomputing, vol. 78, no. 1, pp. 14–22, Feb. 2012. doi: 10.1016/j.neucom.2011.03.058
[39]	B. Kiumarsi, F. L. Lewis, M. B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems using input-output measured data,” IEEE Trans. Cybern., vol. 45, no. 12, pp. 2770–2779, Dec. 2015. doi: 10.1109/TCYB.2014.2384016
[40]	L. Liu, Z. S. Wang, and H. G. Zhang, “Neural-network-based robust optimal tracking control for MIMO discrete-time systems with unknown uncertainty using adaptive critic design,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1239–1251, Apr. 2018. doi: 10.1109/TNNLS.2017.2660070
[41]	B. Luo, D. Liu, T. W. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 10, pp. 2134–2144, Oct. 2016. doi: 10.1109/TNNLS.2016.2585520
[42]	C. Li, J. L. Ding, F. L. Lewis, and T. Y. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, p. 109687, Jul. 2021.
[43]	H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. doi: 10.1016/j.automatica.2014.05.011
[44]	C. B. Qin, H. G. Zhang, and Y. H. Luo, “Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming,” Int. J. Control, vol. 87, no. 5, pp. 1000–1009, May 2014. doi: 10.1080/00207179.2013.863432
[45]	R. Kamalapurkar, H. Dinhb, S. Bhasin, and W. E. Dixon, “Approximate optimal trajectory tracking for continuous-time nonlinear systems,” Automatica, vol. 51, pp. 40–48, Jan. 2015. doi: 10.1016/j.automatica.2014.10.103
[46]	C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. L. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
[47]	X. N. Zhong, Z. Ni, and H. B. He, “A theoretical foundation of goal representation heuristic dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 12, pp. 2513–2525, Dec. 2016. doi: 10.1109/TNNLS.2015.2490698

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(7)

Get Citation

PDF

XML

Article Metrics

Article views (930) PDF downloads(164)

Highlights

The core findings
The essence of the research
The distinction of the paper

Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control

doi: 10.1109/JAS.2022.105692

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content