Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

Qiming Zhao; Hao Xu; Sarangapani Jagannathan

Volume 1 Issue 4

Oct. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(4): 372-384

Qiming Zhao, Hao Xu and Sarangapani Jagannathan, "Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 372-384, 2014.

Citation:

Qiming Zhao, Hao Xu and Sarangapani Jagannathan, "Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 372-384, 2014.

Citation:

PDF( 1230 KB)

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

1. DENSO International America, Inc., MI 48033, USA;
2. College of Science and Engineering, Texas A&M University, TX 78414, USA;
3. Department of Electrical & Computer Engineering, Missouri University of Science and Technology, MO 65401, USA

Funds:

This work was supported by National Science Foundation (NSF), Division of Electrical, Communications and Cyber Systems (ECCS) (1128281) and Missouri S&T University Intelligent System Center.

Abstract

Abstract

In this paper, the output feedback based finitehorizon near optimal regulation of nonlinear affine discretetime systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and timedependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closedloop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.
- Finite-horizon,
- optimal regulation,
- neural network,
- approximate dynamic programming,
- Hamilton-Jacobi-Bellman equation

FullText(HTML)

References(21)

References

[1]	Kirk D. Optimal Control Theory: An Introduction. New Jersey: Prentice-Hall, 1970.
[2]	Lewis F L, Syrmos V L. Optimal Control (Second edition). New York:Wiley, 1995.
[3]	Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic controlusing policy iteration. In: Proceedings of the 1994 American ControlConference. Baltimore, MD, USA: IEEE, 1994. 3475-3479
[4]	Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinearsystems with saturating actuators using a neural network HJB approach.Automatica, 2005, 41(5): 77-79
[5]	Xu H, Jagannathan S, Lewis F L. Stochastic optimal control of unknownnetworked control systems in the presence of random delays and packetlosses. Automatica, 2012, 48(6): 1017-1030
[6]	Xu H, Jagannathan S. Stochastic optimal controller design for uncertainnonlinear networked control system via neuro dynamic programming.IEEE Transactions on Neural Networks and Learning Systems, 2013,24(3): 471-484
[7]	Dierks T, Jagannathan S. Online optimal control of affine nonlineardiscrete-time systems with unknown internal dynamics by using timebasedpolicy update. IEEE Transactions on Neural Networks and LearningSystems, 2012, 23(7): 1118-1129
[8]	Chen Z, Jagannathan S. Generalized Hamilton-Jacobi-Bellman formulationbased neural network control of affine nonlinear discrete-timesystems. IEEE Transactions on Neural Networks, 2008, 19(1): 90-106
[9]	Slotine J E, Li W. Applied Nonlinear Control. Englewood Cliffs, NJ:Prentice-Hall, 1991.
[10]	Khalil H K, Laurent P. High-gain observers in nonlinear feedbackcontrol. International Journal of Robust and Nonlinear Control, 2014,24(6): 993-1015
[11]	Beard R. Improving the Closed-Loop Performance of Nonlinear Systems[Ph. D. dissertation], Rensselaer Polytechnic Institute, USA, 1995.
[12]	Cheng T, Lewis F L, Abu-Khalaf M. A neural network solution forfixed-final-time optimal control of nonlinear systems. Automatica, 2007,43(3): 482-490
[13]	Heydari A, Balakrishan S N. Finite-horizon input-constrained nonlinearoptimal control using single network adaptive critics. IEEE Transactionson Neural Networks and Learning Systems, 2013, 24(1): 145-157
[14]	Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programmingfor finite-horizon optimal control of discrete-time nonlinear systemswith ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1):24-36
[15]	Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinearHJB solution using approximate dynamic programming: convergenceproof. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetic, 2008, 38(4): 943-949
[16]	Narendra K S, Parthasarathy K. Identification and control of dynamicalsystems using neural networks. IEEE Transactions on Neural Networks,1990, 1(1): 4-27
[17]	Sarangapani J. Neural Network Control of Nonlinear Discrete-TimeSystems. Boca Raton, FL: CRC Press, 2006.
[18]	Yu W. Nonlinear system identification using discrete-time recurrentneural networks with stable learning algorithms. Information Sciences,2004, 158: 131-147
[19]	Zhang X, Zhang H G, Sun Q Y, Luo Y H. Adaptive dynamic programmingbased optimal control of unknown nonaffine nonlinear discretetimesystems with proof of convergence. Neurocomputing, 2012, 91:48-55
[20]	Vance J, Jagannathan S. Discrete-time neural network output feedbackcontrol of nonlinear discrete-time systems in non-strict form. Automatica,2008, 44(4): 1020-1027
[21]	Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learningand Approximate Dynamic Programming. Hoboken: Wiley-IEEE Press,2004.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (1614) PDF downloads(31)

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content