A Data-Based Feedback Relearning Algorithm for Uncertain Nonlinear Systems

Chaoxu Mu; Yong Zhang; Guangbin Cai; Ruijun Liu; Changyin Sun

doi:10.1109/JAS.2023.123186

Volume 10 Issue 5

May 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(5): 1288-1303

C. X. Mu, Y. Zhang, G. B. Cai, R. J. Liu, and C. Y. Sun, “A data-based feedback relearning algorithm for uncertain nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 5, pp. 1288–1303, May 2023. doi: 10.1109/JAS.2023.123186

Citation:

C. X. Mu, Y. Zhang, G. B. Cai, R. J. Liu, and C. Y. Sun, “A data-based feedback relearning algorithm for uncertain nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 5, pp. 1288–1303, May 2023. doi: 10.1109/JAS.2023.123186

Citation:

PDF( 3219 KB)

A Data-Based Feedback Relearning Algorithm for Uncertain Nonlinear Systems

doi: 10.1109/JAS.2023.123186

Chaoxu Mu^{1
,
,},
Yong Zhang^1
,,
Guangbin Cai^2
,,
Ruijun Liu^3
,,
Changyin Sun^4
,

1.
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
2.
College of Missile Engineering, Rocket Force University of Engineering, Xi’an 710025, China
3.
School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
4.
School of Automation, Southeast University, Nanjing 210096, China

Funds: This work was supported in part by the National Key Research and Development Program of China (2021YFB1714700) and the National Natural Science Foundation of China (62022061, 6192100028)

More Information

Author Bio:
Chaoxu Mu received the Ph.D. degree in control science and engineering from the School of Automation, Southeast University in 2012. She was a Visiting Ph.D. Student with the Royal Melbourne Institute of Technology University, Australia, from 2010 to 2011. She has been a Post-Doctoral Fellow with the Department of Electrical, Computer and Biomedical Engineering, The University of Rhode Island, USA, from 2014 to 2016. She is currently a Professor with the School of Electrical and Information Engineering, Tianjin University. Her current research interests include nonlinear system control and optimization, adaptive and learning systems. She has authored more than 100 journal and conference papers, and co-authored three monographs

Yong Zhang received the B.S. degree in automation, and the M.S. degree in control engineering from Tianjin University in 2017 and 2020, respectively. He is currently a Ph.D. candidate in control science and engineering with the School of Electrical and Information Engineering, Tianjin University. His research interests include adaptive robust control, data-based reinforcement learning algorithms, autonomous learning systems, and applications to physical systems such as aerospace, smart grids, and precision servo mechanisms

Guangbin Cai received the Ph.D. degree in control science and engineering from the Rocket Force University of Engineering in 2012. He received the reward-gainer of the Excellent Doctor Degree Dissertation in Shaanxi. He is currently an Associate Professor with the College of Missile Engineering, Rocket Force University of Engineering. His main research interests include guidance and control theory and its applications of aircrafts

Ruijun Liu received the M.S. degree in computer application technology from Beihang University in 2009, and the Ph.D. degree in computer science and technology from the Ecole Centrale de Nantes, France in 2013. He is currently with the Beijing Technology and Business University. His current research interests include machine learning, and virtual reality and 3D reconstruction

Changyin Sun (Senior Member, IEEE) received the B.S. degree in applied mathematics from the Department of Mathematics, Sichuan University in 1996, the M.S. and Ph.D. degrees in electrical engineering from Southeast University, in 2001 and 2003, respectively. From March 2004 to September 2004, he was a Postdoctoral Fellow in the Department of Computer Science, The Chinese University of Hong Kong, China. From April 2006 to March 2007, he worked as a Professor in the School of Electrical Engineering, Hohai University. From March 2007, he works as a Professor in the School of Automation, Southeast University. His current research interests include intelligent control, flight control, pattern recognition, and optimal theory. He is an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems, Neural Processing Letters, and the IEEE/CAA Journal of Automatica Sinica
Corresponding author: Chaoxu Mu, e-mail: cxmu@tju.edu.cn
Received Date: 2022-03-25
Revised Date: 2022-06-07
Accepted Date: 2022-08-20

Available Online: 2023-03-14

Abstract

Abstract

In this paper, a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems. Motivated by the classical on-policy and off-policy algorithms of reinforcement learning, the online feedback relearning (FR) algorithm is developed where the collected data includes the influence of disturbance signals. The FR algorithm has better adaptability to environmental changes (such as the control channel disturbances) compared with the off-policy algorithm, and has higher computational efficiency and better convergence performance compared with the on-policy algorithm. Data processing based on experience replay technology is used for great data efficiency and convergence stability. Simulation experiments are presented to illustrate convergence stability, optimality and algorithmic performance of FR algorithm by comparison.
- Data episodes,
- experience replay,
- neural networks,
- reinforcement learning (RL),
- uncertain systems

FullText(HTML)

References(44)

References

[1]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT press, 2018.
[2]	L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research, vol. 4, pp. 237–285, 1996. doi: 10.1613/jair.301
[3]	M. Wiering, “Multi-agent reinforcement learning for traffic light control,” in Proc. 17th Int. Conf. Machine Learning, 2000, pp. 1151–1158.
[4]	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing ATARI with deep reinforcement learning,” arXiv preprint arXiv: 1312.5602, 2013.
[5]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, p. 7587, 2016.
[6]	P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, New York, USA: Van Nostrand Reinhold, 1992, pp. 493–526.
[7]	W. Bai, Q. Zhou, T. Li, and H. Li, “Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation,” IEEE Trans. Cybernetics, vol. 50, no. 8, pp. 3433–3443, 2020. doi: 10.1109/TCYB.2019.2921057
[8]	T. Bian, Y. Jiang, and Z. Jiang, “Adaptive dynamic programming for stochastic systems with state and control dependent noise,” IEEE Trans. Automatic Control, vol. 61, no. 12, pp. 4170–4175, 2016. doi: 10.1109/TAC.2016.2550518
[9]	O. Tutsoy, D. E. Barkana, and H. Tugal, “Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay,” ISA Trans., vol. 76, pp. 67–77, 2018. doi: 10.1016/j.isatra.2018.03.002
[10]	O. Tutsoy, “Design and comparison base analysis of adaptive estimator for completely unknown linear systems in the presence of OE noise and constant input time delay,” Asian J. Control, vol. 18, no. 3, pp. 1020–1029, 2016. doi: 10.1002/asjc.1184
[11]	L. Ren, T. Wang, Y. Laili, and L. Zhang, “A data-driven self-supervised LSTM-DeepFM model for industrial soft sensor,” IEEE Trans. Industrial Informatics, vol. 18, no. 9, pp. 5859–5869, 2022. doi: 10.1109/TⅡ.2021.3131471
[12]	O. Tutsoy and M. Brown, “Reinforcement learning analysis for a minimum time balance problem,” Trans. Institute Measurement Control, vol. 38, no. 10, pp. 1186–1200, 2016. doi: 10.1177/0142331215581638
[13]	X. Yang, H. He, and D. Liu, “Event-triggered optimal neuro-controller design with reinforcement learning for unknown nonlinear systems,” IEEE Trans. Systems,Man,Cybernetics: Systems, vol. 49, no. 9, pp. 1866–1878, 2019. doi: 10.1109/TSMC.2017.2774602
[14]	C. Mu, D. Wang, and H. He, “Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach,” IEEE Trans. Cyber., vol. 48, no. 10, pp. 2948–2961, 2018. doi: 10.1109/TCYB.2017.2752845
[15]	J. Yi, S. Chen, X. Zhong, W. Zhou, and H. He, “Event-triggered globalized dual heuristic programming and its application to networked control systems,” IEEE Trans. Industrial Informatics, vol. 15, no. 3, pp. 1383–1392, 2019. doi: 10.1109/TII.2018.2850001
[16]	M. Imani, S. F. Ghoreishi, and U. M. Braga-Neto, “Bayesian control of large MDPs with unknown dynamics in data-poor environments,” in Proc. Advances Neural Inform. Processing Syst., 2018, pp. 8146–8156.
[17]	Y. Yin, X. Bu, P. Zhu, and W. Qian, “Point-to-point consensus tracking control for unknown nonlinear multi-agent systems using data-driven iterative learning,” Neurocomputing, vol. 488, pp. 78–87, 2022. doi: 10.1016/j.neucom.2022.02.074
[18]	Q. Wei, D. Liu, Y. Liu, and R. Song, “Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 168–176, 2017. doi: 10.1109/JAS.2016.7510262
[19]	Z. Shi and Z. Wang, “Optimal control for a class of complex singular system based on adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 188–197, 2019. doi: 10.1109/JAS.2019.1911342
[20]	Y. Jiang and Z.-P. Jiang, Robust Adaptive Dynamic Programming. New York, USA: John Wiley & Sons, 2017.
[21]	B. Luo, H.-N. Wu, and T. Huang, “Off-policy reinforcement learning for H_∞ control design,” IEEE Trans. Cyber., vol. 45, no. 1, pp. 65–76, 2015. doi: 10.1109/TCYB.2014.2319577
[22]	Q. Zhang, D. Zhao, and Y. Zhu, “Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs,” Neurocomputing, vol. 238, pp. 377–386, 2017. doi: 10.1016/j.neucom.2017.01.076
[23]	B. Luo and H.-N. Wu, “Computationally efficient simultaneous policy update algorithm for nonlinear H_∞ state feedback control with galerkin’s method,” Int. J. Robust Nonlinear Control, vol. 23, no. 9, pp. 991–1012, 2013. doi: 10.1002/rnc.2814
[24]	H. He, Z. Ni, and J. Fu, “A three-network architecture for on-line learning and optimization based on adaptive dynamic programming,” Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012. doi: 10.1016/j.neucom.2011.05.031
[25]	C. Mu, Y. Zhang, Z. Gao, and C. Sun, “ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties,” IEEE Trans. Syst.,Man,Cyber.: Syst., vol. 50, no. 11, pp. 4056–4067, 2020. doi: 10.1109/TSMC.2019.2895692
[26]	S. Adam, L. Busoniu, and R. Babuska, “Experience replay for real-time reinforcement learning control,” IEEE Trans. Systems,Man,Cybernetics,Part C (Applications and Reviews), vol. 42, no. 2, pp. 201–212, 2011.
[27]	L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, no. 3−4, pp. 293–321, 1992. doi: 10.1007/BF00992699
[28]	S. Kalyanakrishnan and P. Stone, “Batch reinforcement learning in a complex domain,” in Proc. 6th Int. Joint Conf. Autonomous Agents and Multi-Agent Systems, 2007, pp. 662–669.
[29]	H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, 2014. doi: 10.1016/j.automatica.2013.09.043
[30]	P. Wawrzyński, “Real-time reinforcement learning by sequential actor-critics and experience replay,” Neural Networks, vol. 22, no. 10, pp. 1484–1497, 2009. doi: 10.1016/j.neunet.2009.05.011
[31]	J. Na, J. Zhao, G. Gao, and Z. Li, “Output-feedback robust control of uncertain systems via online data-driven learning,” IEEE Trans. Neural Networks Learning Syst., vol. 32, no. 6, pp. 2650–2662, 2021. doi: 10.1109/TNNLS.2020.3007414
[32]	F. Lin, “An optimal control approach to robust control design,” Int. J. Control, vol. 73, no. 3, pp. 177–186, 2000. doi: 10.1080/002071700219722
[33]	B. Zhao, D. Liu, and Y. Li, “Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems,” Information Sciences, vol. 384, pp. 21–33, 2017. doi: 10.1016/j.ins.2016.12.016
[34]	B. Luo, H.-N. Wu, T. Huang, and D. Liu, “Reinforcement learning solution for HJB equation arising in constrained optimal control problem,” Neural Networks, vol. 71, pp. 150–158, 2015. doi: 10.1016/j.neunet.2015.08.007
[35]	Y. Jiang and Z. Jiang, “Robust adaptive dynamic programming and feedback stabilization of nonlinear systems,” IEEE Trans. Neural Networks Learning Syst., vol. 25, no. 5, pp. 882–893, 2014. doi: 10.1109/TNNLS.2013.2294968
[36]	B. Kiumarsi, F. L. Lewis, and Z. P. Jiang, “H_∞ control of linear discrete-time systems: Off-policy reinforcement learning,” Automatica, vol. 78, pp. 144–152, 2017. doi: 10.1016/j.automatica.2016.12.009
[37]	M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, 2005. doi: 10.1016/j.automatica.2004.11.034
[38]	G. Chowdhary, M. Liu, R. Grande, T. Walsh, J. How, and L. Carin, “Off-policy reinforcement learning with gaussian processes,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 227–238, 2014. doi: 10.1109/JAS.2014.7004680
[39]	T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.
[40]	J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in Proc. 34th Int. Conf. Machine Learning, 2017, pp. 1146–1155.
[41]	T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,”arXiv preprint arXiv: 1511.05952, 2015.
[42]	V. Nevistic and J. A. Primbs, “Constrained nonlinear optimal control: A converse HJB approach,” Control and Dynamical Syst., Technical Memorandum no. CIT-CDS 96-021, 1996.
[43]	J. Si and Y.-T. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, 2001. doi: 10.1109/72.914523
[44]	H. K. Khalil, Nonlinear Systems, 3rd. Upper Saddle River, USA: Prentice Hall, 2002.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(21) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (717) PDF downloads(73)

Highlights

This paper investigates a new data-based reinforcement learning algorithm for the robust control of uncertain nonlinear system. In consideration of uncertainty problems, the collected data may affect the optimality of the off-policy algorithm. The obtained non-optimal solution may cause divergence. Meanwhile, the data utilization efficiency and convergence of on-policy algorithm are not as good as off-policy algorithm. On this basis, the proposed data-based feedback relearning algorithm can effectively deal with these two problems
A data processing method based on experience replay technology is designed to improve the data utilization efficiency and algorithm convergence. The correlation between collected adjacent data episodes can be reduced, and the matrix singularity problem will also be alleviated. However, many reported data-based reinforcement learning algorithms have not carefully considered this issue
The convergence stability, optimality and algorithmic performance of the proposed algorithm are analyzed through comparative experiments with other typical algorithms. These contributions are new about this paper relative to the state-of-the-art research

A Data-Based Feedback Relearning Algorithm for Uncertain Nonlinear Systems

doi: 10.1109/JAS.2023.123186

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content