Composite Adaptive Critic Design

Yongliang Yang; Hamidreza Modares; Kyriakos G. Vamvoudakis; Frank L. Lewis

doi:10.1109/JAS.2025.125435

Volume 12 Issue 12

Dec. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(12): 2525-2540

Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Composite adaptive critic design,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2525–2540, Dec. 2025. doi: 10.1109/JAS.2025.125435

Citation:

Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Composite adaptive critic design,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2525–2540, Dec. 2025. doi: 10.1109/JAS.2025.125435

Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Composite adaptive critic design,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2525–2540, Dec. 2025. doi: 10.1109/JAS.2025.125435

Citation:

Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Composite adaptive critic design,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2525–2540, Dec. 2025. doi: 10.1109/JAS.2025.125435

PDF( 1676 KB)

Composite Adaptive Critic Design

doi: 10.1109/JAS.2025.125435

Funds: This work was supported in part by the National Natural Science Foundation of China (62373039), and the National Science Foundation (CPS2227185, CPS2038589)

More Information

Author Bio:
Yongliang Yang (Member, IEEE) received the B.S. degree in electrical engineering from Hebei University in 2011, and the Ph.D. degree in control theory and control engineering from the University of Science and Technology Beijing (USTB) in 2018.From 2015 to 2017, he was a Visiting Scholar with the Missouri University of Science and Technology, Rolla, USA, sponsored by China Scholarship Council. He was an Assistant Professor with USTB from 2018 to 2020, and an Associate Professor with USTB in 2020. From 2020 to 2021, he was an independent Postdoctoral Research Fellow with the State Key Laboratory of Internet of Things for Smart City, Faculty of Science and Technology, University of Macau, China. He is currently the Marie Skłodowska-Curie Research Fellow with the School of Engineering, University of Warwick, UK.His research interests include reinforcement learning theory, robotics, distributed optimization and control for cyber-physical systems. Dr. Yang was a recipient of the Best Ph.D. Dissertation of the China Association of Artificial Intelligence, the Best Ph.D. Dissertation of USTB, Excellent Graduates Awards in Beijing, UM Macao Talent Program in Macau, and Marie Skłodowska-Curie Fellowship in Europe. He is an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems

Hamidreza Modares (Senior Member, IEEE) received the B.S. degree in electrical engineering from the University of Tehran, Tehran, Iran, in 2004, the M.S. degree in electrical engineering from the Shahrood University of Technology, Shahrood, Iran, in 2006, and the Ph.D. degree in electrical engineering from the University of Texas at Arlington, Arlington, USA, in 2015.He is currently an Associate Professor with the Department of Mechanical Engineering, Michigan State University, East Lansing, USA. Before joining Michigan State University, he was an Assistant professor with the Department of Electrical Engineering, Missouri University of Science and Technology, Rolla, USA. His current research interests include reinforcement learning, safe control, machine learning in control, distributed control of multiagent systems, and robotics.Dr. Modares has served as an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, Neurocomputing, and IEEE Transactions on Systems, Man, and Cybernetics: Systems for the past five years

Kyriakos G. Vamvoudakis (Senior Member, IEEE) received the diploma degree in electronic and computer engineering from the Technical University of Crete, Crete, Greece, in 2006, and the M.Sc. and Ph.D. degrees in electrical engineering from The University of Texas, Arlington, USA, in 2008 and 2011, respectively.From 2012 to 2016, he was a Project Research Scientist with the Center for Control, Dynamical Systems and Computation, University of California, Santa Barbara, USA. Until 2018, he was an Assistant Professor with the Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg, USA. He is currently the Dutton-Ducoffe Endowed Professor with the Daniel Guggenheim School of Aerospace Engineering, Georgia Tech, Atlanta, USA. He holds a secondary appointment with the School of Electrical and Computer Engineering. His research interests include reinforcement learning, control theory, cyber-physical security, bounded rationality, and safe/assured autonomy.Dr. Vamvoudakis was the recipient of a 2018 NSF CAREER award, a 2018 DoD Minerva Research Initiative Award, a 2019 ARO YIP award, a 2021 GT Chapter Sigma Xi Young Faculty Award, and his work has been recognized with best paper nominations and several international awards, including the 2016 International Neural Network Society Young Investigator Award, the Best Paper Award for autonomous vehicles at the 27th Army Science Conference in 2010, and the Best Researcher Award from the Automation and Robotics Research Institute in 2011. He is currently a Member of the IEEE Control Systems Society Conference Editorial Board, and an Associate Editor of Automatica; IEEE Transactions on Automatic Control; IEEE Transactions on Systems, Man, and Cybernetics: Systems; IEEE Transactions on Artificial Intelligence; Neurocomputing; Journal of Optimization Theory and Applications; IEEE Transactions on Neural Networks and Learning Systems. He is an Associate Fellow of AIAA

Frank L. Lewis (Life Fellow, IEEE) received the bachelor degree in physics/electrical engineering and the M.S.E.E. degree from Rice University, Houston, USA, in 1970 and 1971, respectively, the M.S. degree in aeronautical engineering from the University of West Florida, Pensacola, USA, in 1977, and the Ph.D. degree from Georgia Institute of Technology, Atlanta, USA, in 1988.He is currently a Distinguished Scholar Professor, a Distinguished Teaching Professor, and the Moncrief-O’Donnell Chair of the University of Texas at Arlington Research Institute, Fort Worth, USA. His research interests include feedback control, reinforcement learning, intelligent systems, and distributed control. He has authored seven U.S. patents, numerous journal special issues, numerous journal articles, and 20 books.Dr. Lewis is a Member of the National Academy of Inventors, an International Federation of Automatic Control (IFAC) Fellow, a Fellow of the Institute of Measurement and Control, PE Texas, USA and a U.K. Chartered Engineer. He is a Founding Member of the Board of Governors of the Mediterranean Control Association. He received the Fulbright Research Award, the NSF Research Initiation Grant, American Society for Engineering Education (ASEE) Terman Award, the International Neural Network Society Gabor Award, U.K. Institute Measurement and Control Honeywell Field Engineering Medal, the IEEE Computational Intelligence Society Neural Networks Pioneer Award, American Institute of Aeronautics and Astronautics (AIAA) Intelligent Systems Award, and Texas Regents Outstanding Teaching Award 2013. He was listed in Fort Worth Business Press’s Top 200 Leaders in Manufacturing
Corresponding author: Hamidreza Modares, e-mail: modaresh@msu.edu
¹ The concept of admissible policy is referred to [15].
Received Date: 2024-10-02
Revised Date: 2024-12-03
Accepted Date: 2025-03-25

Abstract

Abstract

In this paper, we present a novel adaptive critic design with a finite excitation (FE) condition for adaptive optimal control of continuous-time nonlinear systems. The online recorded data is combined with instantaneous data via a hierarchical design scheme to replace the persistence of excitation condition with a FE condition. The online data is recorded in the preprocessing step and verified online, whereas the novel critic is implemented in the assembling step through appropriate algebraic calculations. On this basis, the composite adaptive critic design is developed with satisfactory convergence. The adaptive critic design can be implemented in an online fashion with the hierarchical design scheme, and the FE condition can also be online verifiable. It is shown that the composite adaptive critic design guarantees the closed-loop stability of the equilibrium point and convergence to the optimal solution. Simulations are conducted to show the efficacy of the composite adaptive critic design.
- Adaptive optimal control,
- composite adaptive critic,
- finite excitation (FE),
- hierarchical design

FullText(HTML)

¹ The concept of admissible policy is referred to [15].

References(54)

References

[1]	M. Xin and H. Pan, “Indirect robust control of spacecraft via optimal control solution,” IEEE Trans. Aerosp. Electron. Syst., vol. 48, no. 2, pp. 1798–1809, Apr. 2012. doi: 10.1109/TAES.2012.6178102
[2]	H. Waschl, I. Kolmanovsky, M. Steinbuch, and L. del Re, Optimization and Optimal Control in Automotive Systems. Cham, Germany: Springer, 2014.
[3]	Y. Yang, Z. Ding, R. Wang, H. Modares, and D. C. Wunsch, “Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 47–63, Jan. 2022. doi: 10.1109/JAS.2021.1004258
[4]	J. L. Speyer and D. H. Jacobson, Primer on Optimal Control Theory. Philadelphia, USA: Society for Industrial and Applied Mathematics, 2010.
[5]	D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, USA: Princeton University Press, 2012.
[6]	D. P. Bertsekas, Dynamic Programming and Optimal Control. 4th ed. Belmont, USA: Athena Scientific, 2012.
[7]	F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. 3rd ed. Hoboken, USA: John Wiley & Sons, 2012.
[8]	D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, Feb. 2009. doi: 10.1016/j.automatica.2008.08.017
[9]	H. Modares, F. L. Lewis, and M. B. Naghibi-Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, Jan. 2014. doi: 10.1016/j.automatica.2013.09.043
[10]	C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
[11]	D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832, Aug. 2012. doi: 10.1016/j.automatica.2012.05.049
[12]	Y. Jiang and Z. P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, Oct. 2012. doi: 10.1016/j.automatica.2012.06.096
[13]	W. Gao and Z. P. Jiang, “Adaptive dynamic programming and adaptive optimal output regulation of linear systems,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4164–4169, Dec. 2016. doi: 10.1109/TAC.2016.2548662
[14]	D. Wang, J. Wang, M. Zhao, P. Xin, and J. Qiao, “Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1797–1809, Sept. 2023. doi: 10.1109/JAS.2023.123684
[15]	K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010. doi: 10.1016/j.automatica.2010.02.018
[16]	R. Kamalapurkar, J. A. Rosenfeld, and W. E. Dixon, “Efficient model-based reinforcement learning for approximate online optimal control,” Automatica, vol. 74, pp. 247–258, Dec. 2016. doi: 10.1016/j.automatica.2016.08.004
[17]	D. Wang and X. Zhong, “Advanced policy learning near-optimal regulation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 743–749, May 2019. doi: 10.1109/JAS.2019.1911489
[18]	F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, USA: John Wiley & Sons, Ltd, 2012.
[19]	M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
[20]	Y. Yang, D. C. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Netw. Learning Syst., vol. 28, no. 8, pp. 1929–1940, Aug. 2017. doi: 10.1109/TNNLS.2017.2654324
[21]	S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 49, no. 1, pp. 82–92, Jan. 2013. doi: 10.1016/j.automatica.2012.09.019
[22]	G. Chowdhary, T. Yucelen, M. Mühlegg, and E. N. Johnson, “Concurrent learning adaptive control of linear systems with exponentially convergent bounds,” Int. J. Adapt. Control Signal Process., vol. 27, no. 4, pp. 280–301, Apr. 2013. doi: 10.1002/acs.2297
[23]	S. K. Jha, S. B. Roy, and S. Bhasin, “Initial excitation-based iterative algorithm for approximate optimal control of completely unknown LTI systems,” IEEE Trans. Autom. Control, vol. 64, no. 12, pp. 5230–5237, Dec. 2019. doi: 10.1109/TAC.2019.2912828
[24]	B. Kiumarsi, F. L. Lewis, and Z. P. Jiang, “H_∞ control of linear discrete-time systems: Off-policy reinforcement learning,” Automatica, vol. 78, pp. 144–152, Apr. 2017. doi: 10.1016/j.automatica.2016.12.009
[25]	Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free λ-policy iteration for discrete-time linear quadratic regulation,” IEEE Trans. Neural Netw. Learning Syst., vol. 34, no. 2, pp. 635–649, Feb. 2023. doi: 10.1109/TNNLS.2021.3098985
[26]	S. Adam, L. Busoniu, and R. Babuska, “Experience replay for real-time reinforcement learning control,” IEEE Trans. Syst., Man, Cybernetics, Part C (Appl. Rev.), vol. 42, no. 2, pp. 201–212, Mar. 2012. doi: 10.1109/TSMCC.2011.2106494
[27]	L. J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Mach. Learn., vol. 8, no. 3, pp. 293–321, May 1992. doi: 10.1023/A:1022628806385
[28]	R. Liu and J. Zou, “The effects of memory replay in reinforcement learning,” in Proc. 56th Annu. Allerton Conf. Communication, Control, and Computing, Monticello, USA, 2018, pp. 478-485.
[29]	R. Kamalapurkar, B. Reish, G. Chowdhary, and W. E. Dixon, “Concurrent learning for parameter estimation using dynamic state-derivative estimators,” IEEE Trans. Autom. Control, vol. 62, no. 7, pp. 3594–3601, Jul. 2017. doi: 10.1109/TAC.2017.2671343
[30]	G. Chowdhary, “Concurrent learning for convergence in adaptive control without persistency of excitation,” Ph.D. dissertation, Georgia Institute of Technology, Atlanta, USA, 2010.
[31]	G. Chowdhary, M. Mühlegg, and E. Johnson, “Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation,” Int. J. Control, vol. 87, no. 8, pp. 1583–1603, May 2014. doi: 10.1080/00207179.2014.880128
[32]	R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, Feb. 2016. doi: 10.1016/j.automatica.2015.10.039
[33]	K. G. Vamvoudakis, M. F. Miranda, and J. P. Hespanha, “Asymptotically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,” IEEE Trans. Neural Netw. Learning Syst., vol. 27, no. 11, pp. 2386–2398, Nov. 2016. doi: 10.1109/TNNLS.2015.2487972
[34]	R. Kamalapurkar, H. Dinh, S. Bhasin, and W. E. Dixon, “Approximate optimal trajectory tracking for continuous-time nonlinear systems,” Automatica, vol. 51, pp. 40–48, Jan. 2015. doi: 10.1016/j.automatica.2014.10.103
[35]	Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Trans. Syst., Man, Cybernetics: Syst., vol. 51, no. 10, pp. 6423–6434, Oct. 2021. doi: 10.1109/TSMC.2019.2962103
[36]	G. Chowdhary and E. Johnson, “A singular value maximizing data recording algorithm for concurrent learning,” in Proc. American Control Conf., San Francisco, USA, 2011, pp. 3547−3552.
[37]	Y. Pan and H. Yu, “Composite learning from adaptive dynamic surface control,” IEEE Trans. Autom. Control, vol. 61, no. 9, pp. 2603–2609, Sept. 2016. doi: 10.1109/TAC.2015.2495232
[38]	Y. Pan and H. Yu, “Composite learning robot control with guaranteed parameter convergence,” Automatica, vol. 89, pp. 398–406, Mar. 2018. doi: 10.1016/j.automatica.2017.11.032
[39]	K. Guo, Y. Pan, D. Zheng, and H. Yu, “Composite learning control of robotic systems: A least squares modulated approach,” Automatica, vol. 111, p. 108612, Jan. 2020. doi: 10.1016/j.automatica.2019.108612
[40]	H. Dong, Q. Hu, M. R. Akella, and H. Yang, “Composite adaptive attitude-tracking control with parameter convergence under finite excitation,” IEEE Trans. Control Syst. Technol., vol. 28, no. 6, pp. 2657–2664, Nov. 2020. doi: 10.1109/TCST.2019.2942802
[41]	H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. doi: 10.1016/j.automatica.2014.05.011
[42]	M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
[43]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst., Man, Cybernetics, Part B, vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
[44]	F. L. Lewis, S. Jagannathan, and A. Yesildirak, Neural Network Control of Robot Manipulators and Non-linear Systems. London, UK: CRC Press, 1998.
[45]	P. A. Ioannou and J. Sun, Robust Adaptive Control. Hoboken, USA: Prentice-Hall, Inc., 1995.
[46]	N. Cho, H. S. Shin, Y. Kim, and A. Tsourdos, “Composite model reference adaptive control with parameter convergence under finite excitation,” IEEE Trans. Autom. Control, vol. 63, no. 3, pp. 811–818, Mar. 2018. doi: 10.1109/TAC.2017.2737324
[47]	C. Possieri and M. Sassano, “Data-driven policy iteration for nonlinear optimal control problems,” IEEE Trans. Neural Netw. Learning Syst., vol. 34, no. 10, pp. 7365–7376, Oct. 2023. doi: 10.1109/TNNLS.2022.3142501
[48]	K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990. doi: 10.1016/0893-6080(90)90005-6
[49]	B. A. Finlayson, The Method of Weighted Residuals and Variational Principles. Philadelphia, USA: Society for Industrial and Applied Mathematics, 2013.
[50]	M. M. Peet, “Exponentially stable nonlinear systems have polynomial Lyapunov functions on bounded regions,” IEEE Trans. Autom. Control, vol. 54, no. 5, pp. 979–987, May 2009. doi: 10.1109/TAC.2009.2017116
[51]	S. Basu Roy, S. Bhasin, and I. Narayan Kar, “A UGES switched MRAC architecture using initial excitation,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 7044–7051, Jul. 2017. doi: 10.1016/j.ifacol.2017.08.1350
[52]	O. Djaneye-Boundjou and R. Ordóñez, “Gradient-based discrete-time concurrent learning for standalone function approximation,” IEEE Trans. Autom. Control, vol. 65, no. 2, pp. 749–756, Feb. 2020. doi: 10.1109/TAC.2019.2920087
[53]	W. Haddad and V. Chellaboina, Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton, USA: Princeton University Press, 2008.
[54]	H. K. Khalil, Nonlinear Systems. 3rd ed. Upper Saddle River, USA: Prentice Hall, 2001.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5)

Get Citation

PDF

XML

Article Metrics

Article views (10) PDF downloads(1)

Composite Adaptive Critic Design

doi: 10.1109/JAS.2025.125435

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content