A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 11 Issue 1
Jan.  2024

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
D. Wang, N. Gao, D. Liu, J. Li, and F. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
Citation: D. Wang, N. Gao, D. Liu, J. Li, and F. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843

Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications

doi: 10.1109/JAS.2023.123843
Funds:  This work was supported in part by the National Natural Science Foundation of China (62222301, 62073085, 62073158, 61890930-5, 62021003), the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5), and Beijing Natural Science Foundation (JQ19013)
More Information
  • Reinforcement learning (RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming (ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively. Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks, showing how they promote ADP formulation significantly. Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has demonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.

     

  • loading
  • [1]
    P. McCorduck and C. Cfe, Machines Who Think: A Personal Inquiry Into the History and Prospects of Artificial Intelligence. 2nd ed. New York, USA: CRC Press, 2004.
    [2]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
    [3]
    K. Doya, H. Kimura, and M. Kawato, “Neural mechanisms of learning and control,” IEEE Control Syst. Mag., vol. 21, no. 4, pp. 42–54, Aug. 2001. doi: 10.1109/37.939943
    [4]
    D. Wang, M. Ha, and M. Zhao, “The intelligent critic framework for advanced optimal control,” Artif. Intell. Rev., vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
    [5]
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, USA: The MIT Press, 2018.
    [6]
    F. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, USA: Wiley, 2013.
    [7]
    L. Kong, W. He, C. Yang, and C. Sun, “Robust neurooptimal control for a robot via adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2584–2594, Jun. 2021. doi: 10.1109/TNNLS.2020.3006850
    [8]
    R. E. Bellman, Dynamic Programming. Princeton, USA: Princeton University Press, 1957.
    [9]
    F. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control. 3rd ed. Hoboken, USA: Wiley, 2012.
    [10]
    P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” Gen. Syst., vol. 22, pp. 25–38, Jan. 1977.
    [11]
    D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 1, pp. 142–160, Jan. 2021. doi: 10.1109/TSMC.2020.3042876
    [12]
    Z.-P. Jiang and Y. Jiang, “Robust adaptive dynamic programming for linear and nonlinear systems: An overview,” Eur. J. Control, vol. 19, no. 5, pp. 417–425, Sept. 2013. doi: 10.1016/j.ejcon.2013.05.017
    [13]
    D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Programming With Applications in Optimal Control. Cham, Switzerland: Springer, 2017.
    [14]
    M. Ha, D. Wang, and D. Liu, “Generalized value iteration for discounted optimal control with stability analysis,” Syst. Control Lett., vol. 147, p. 104847, Jan. 2021. doi: 10.1016/j.sysconle.2020.104847
    [15]
    D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 45, no. 12, pp. 1577–1591, Dec. 2015. doi: 10.1109/TSMC.2015.2417510
    [16]
    H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control: Algorithms and Stability. London, UK: Springer, 2013.
    [17]
    D. Wang, H. He, and D. Liu, “Adaptive critic nonlinear robust control: A survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017. doi: 10.1109/TCYB.2017.2712188
    [18]
    B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
    [19]
    F. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
    [20]
    A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst.,Man,Cybern.,Part B: Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
    [21]
    H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3331–3340, Oct. 2017. doi: 10.1109/TCYB.2016.2611613
    [22]
    X. Wang, D. Ding, X. Ge, and Q.-L. Han, “Supplementary control for quantized discrete-time nonlinear systems under goal representation heuristic dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., 2022. DOI: 10.1109/TNNLS.2022.3201521
    [23]
    D. Wang, M. Zhao, M. Ha, and J. Ren, “Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application,” Neural Netw., vol. 143, pp. 121–132, Nov. 2021. doi: 10.1016/j.neunet.2021.05.027
    [24]
    D. Wang, M. Zhao, and J. Qiao, “Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system,” Int. J. Robust Nonlinear Control, vol. 31, no. 14, pp. 6773–6787, Sept. 2021. doi: 10.1002/rnc.5639
    [25]
    J. Xu, J. Wang, J. Rao, Y. Zhong, and H. Wang, “Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function,” Int. J. Robust Nonlinear Control, vol. 32, no. 6, pp. 3408–3424, Apr. 2022. doi: 10.1002/rnc.5955
    [26]
    H. Li and D. Liu, “Optimal control for discrete-time affine non-linear systems using general value iteration,” IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, Dec. 2012. doi: 10.1049/iet-cta.2011.0783
    [27]
    Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
    [28]
    M. Ha, D. Wang, and D. Liu, “Offline and online adaptive critic control designs with stability guarantee through value iteration,” IEEE Trans. Cybern., vol. 52, no. 12, pp. 13262–13274, Dec. 2022. doi: 10.1109/TCYB.2021.3107801
    [29]
    D. Wang, M. Zhao, M. Ha, and J. Qiao, “Discounted near-optimal regulation of constrained nonlinear systems via generalized value iteration,” Int. J. Robust Nonlinear Control, vol. 31, no. 17, pp. 8481–8503, Nov. 2021. doi: 10.1002/rnc.5729
    [30]
    C. Kamanchi, R. B. Diddigi, and S. Bhatnagar, “Generalized second-order value iteration in Markov decision processes,” IEEE Trans. Autom. Control, vol. 67, no. 8, pp. 4241–4247, Aug. 2022. doi: 10.1109/TAC.2021.3112851
    [31]
    D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
    [32]
    Y. Zhu, D. Zhao, and H. He, “Invariant adaptive dynamic programming for discrete-time optimal control,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 3959–3971, Nov. 2020. doi: 10.1109/TSMC.2019.2911900
    [33]
    B. Luo, Y. Yang, H. N. Wu, and T. Huang, “Balancing value iteration and policy iteration for discrete-time control,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 3948–3958, Nov. 2020. doi: 10.1109/TSMC.2019.2898389
    [34]
    A. Heydari, “Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522–4527, Sept. 2018. doi: 10.1109/TNNLS.2017.2755501
    [35]
    T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, Sept. 2016. doi: 10.1016/j.automatica.2016.05.003
    [36]
    H. Zargarzadeh, T. Dierks, and S. Jagannathan, “Optimal control of nonlinear continuous-time systems in strict-feedback form,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2535–2549, Oct. 2015. doi: 10.1109/TNNLS.2015.2441712
    [37]
    C. Possieri and M. Sassano, “Data-driven policy iteration for nonlinear optimal control problems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 7365–7376, Oct. 2023. doi: 10.1109/TNNLS.2022.3142501
    [38]
    S. He, H. Fang, M. Zhang, F. Liu, and Z. Ding, “Adaptive optimal control for a class of nonlinear systems: The online policy iteration approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 2, pp. 549–558, Feb. 2020. doi: 10.1109/TNNLS.2019.2905715
    [39]
    Q. Wei, H. Li, X. Yang, and H. He, “Continuous-time distributed policy iteration for multicontroller nonlinear systems,” IEEE Trans. Cybern., vol. 51, no. 5, pp. 2372–2383, May 2021. doi: 10.1109/TCYB.2020.2979614
    [40]
    Q. Wei, Z. Liao, Z. Yang, B. Li, and D. Liu, “Continuous-time time-varying policy iteration,” IEEE Trans. Cybern., vol. 50, no. 12, pp. 4958–4971, Dec. 2020. doi: 10.1109/TCYB.2019.2926631
    [41]
    T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 7, pp. 2781–2790, Jul. 2022. doi: 10.1109/TNNLS.2020.3045087
    [42]
    R. Moghadam, P. Natarajan, and S. Jagannathan, “Online optimal adaptive control of partially uncertain nonlinear discrete-time systems using multilayer neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4840–4850, Sept. 2022. doi: 10.1109/TNNLS.2021.3061414
    [43]
    C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 48, no. 4, pp. 511–521, Apr. 2018. doi: 10.1109/TSMC.2016.2606479
    [44]
    J. Li, H. Modares, T. Chai, F. Lewis, and L. Xie, “Off-policy reinforcement learning for synchronization in multiagent graphical games,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2434–2445, Oct. 2017. doi: 10.1109/TNNLS.2016.2609500
    [45]
    Q. Wei, D. Liu, and Y. Xu, “Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach,” Soft Comput., vol. 20, no. 2, pp. 697–706, Feb. 2016. doi: 10.1007/s00500-014-1533-0
    [46]
    R. Song and L. Zhu, “Optimal fixed-point tracking control for discrete-time nonlinear systems via ADP,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 657–666, May 2019. doi: 10.1109/JAS.2019.1911453
    [47]
    Q. Lin, Q. Wei, and D. Liu, “A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm,” Int. J. Syst. Sci., vol. 48, no. 3, pp. 525–534, May 2017. doi: 10.1080/00207721.2016.1188177
    [48]
    B. Kiumarsi and F. Lewis, “Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 140–151, Jan. 2015. doi: 10.1109/TNNLS.2014.2358227
    [49]
    M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 6, pp. 3692–3703, Jun. 2022. doi: 10.1109/TSMC.2021.3071968
    [50]
    C. Li, J. Ding, F. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, p. 109687, Jul. 2021. doi: 10.1016/j.automatica.2021.109687
    [51]
    M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
    [52]
    W. Gao and Z.-P. Jiang, “Adaptive dynamic programming and adaptive optimal output regulation of linear systems,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4164–4169, Dec. 2016. doi: 10.1109/TAC.2016.2548662
    [53]
    C. Chen, H. Modares, K. Xie, F. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
    [54]
    Y. Fu, C. Hong, J. Fu, and T. Chai, “Approximate optimal tracking control of nondifferentiable signals for a class of continuous-time nonlinear systems,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 4441–4450, Jun. 2022. doi: 10.1109/TCYB.2020.3027344
    [55]
    S. Mukherjee, H. Bai, and A. Chakrabortty, “Model-based and model-free designs for an extended continuous-time LQR with exogenous inputs,” Syst. Control Lett., vol. 154, p. 104983, Aug. 2021. doi: 10.1016/j.sysconle.2021.104983
    [56]
    S. A. A. Rizvi and Z. Lin, “Output feedback Q-learning control for the discrete-time linear quadratic regulator problem,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 5, pp. 1523–1536, May 2019. doi: 10.1109/TNNLS.2018.2870075
    [57]
    D. Wang, J. Ren, M. Ha, and J. Qiao, “System stability of learning-based linear optimal control with general discounted value iteration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 6504–6514, Sept. 2023. doi: 10.1109/TNNLS.2021.3137524
    [58]
    S. K. Jha and S. Bhasin, “Adaptive linear quadratic regulator for continuous-time systems with uncertain dynamics,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 833–841, May 2020. doi: 10.1109/JAS.2019.1911438
    [59]
    S. A. A. Rizvi and Z. Lin, “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,” IEEE Trans. Cybern., vol. 50, no. 11, pp. 4670–4679, Nov. 2020. doi: 10.1109/TCYB.2018.2886735
    [60]
    Y. Jiang, J. Fan, T. Chai, F. Lewis, and J. Li, “Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 10, pp. 4607–4620, Oct. 2018. doi: 10.1109/TNNLS.2017.2771459
    [61]
    L. Dong, X. Zhong, C. Sun, and H. He, “Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 7, pp. 1594–1605, Jul. 2017. doi: 10.1109/TNNLS.2016.2541020
    [62]
    M. Ha, D. Wang, and D. Liu, “Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 9, pp. 3158–3168, Sept. 2020. doi: 10.1109/TSMC.2018.2868510
    [63]
    F. Zhao, W. Gao, T. Liu, and Z.-P. Jiang, “Adaptive optimal output regulation of linear discrete-time systems based on event-triggered output-feedback,” Automatica, vol. 137, p. 110103, Mar. 2022. doi: 10.1016/j.automatica.2021.110103
    [64]
    Q. Zhao, J. Si, and J. Sun, “Online reinforcement learning control by direct heuristic dynamic programming: From time-driven to event-driven,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 8, pp. 4139–4144, Aug. 2022. doi: 10.1109/TNNLS.2021.3053037
    [65]
    J. Lu, Q. Wei, Y. Liu, T. Zhou, and F.-Y. Wang, “Event-triggered optimal parallel tracking control for discrete-time nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 6, pp. 3772–3784, Jun. 2022. doi: 10.1109/TSMC.2021.3073429
    [66]
    Q. Wei, J. Lu, T. Zhou, X. Cheng, and F.-Y. Wang, “Event-triggered near-optimal control of discrete-time constrained nonlinear systems with application to a boiler-turbine system,” IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 3926–3935, Jun. 2022. doi: 10.1109/TII.2021.3116084
    [67]
    D. Wang, L. Hu, M. Zhao, and J. Qiao, “Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 3926–3935, Sept. 2023. doi: 10.1109/TNNLS.2021.3135405
    [68]
    B. Luo, Y. Yang, D. Liu, and H.-N. Wu, “Event-triggered optimal control with performance guarantees using adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 1, pp. 76–88, Jan. 2020. doi: 10.1109/TNNLS.2019.2899594
    [69]
    H. Zhang, J. Zhang, Y. Cai, S. Sun, and J. Sun, “Leader-following consensus for a class of nonlinear multiagent systems under event-triggered and edge-event triggered mechanisms,” IEEE Trans. Cybern., vol. 52, no. 8, pp. 7643–7654, Aug. 2022. doi: 10.1109/TCYB.2020.3035907
    [70]
    X. Huo, H. R. Karimi, X. Zhao, B. Wang, and G. Zong, “Adaptive-critic design for decentralized event-triggered control of constrained nonlinear interconnected systems within an identifier-critic framework,” IEEE Trans. Cybern., vol. 52, no. 8, pp. 7478–7491, Aug. 2022. doi: 10.1109/TCYB.2020.3037321
    [71]
    V. Narayanan, H. Modares, and S. Jagannathan, “Event-triggered control of input-affine nonlinear interconnected systems using multiplayer game,” Int. J. Robust Nonlinear Control, vol. 31, no. 3, pp. 950–970, Oct. 2021. doi: 10.1002/rnc.5321
    [72]
    X. Yang and Q. Wei, “Adaptive critic learning for constrained optimal event-triggered control with discounted cost,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 91–104, Jan. 2021. doi: 10.1109/TNNLS.2020.2976787
    [73]
    M. Bahraini, M. Zanon, A. Colombo, and P. Falcone, “Optimal control design for perturbed constrained networked control systems,” IEEE Control Syst. Lett., vol. 5, no. 2, pp. 553–558, Apr. 2021. doi: 10.1109/LCSYS.2020.3004204
    [74]
    M. H. Mamduhi, D. Maity, S. Hirche, J. S. Baras, and K. H. Johansson, “Delay-sensitive joint optimal control and resource management in multiloop networked control systems,” IEEE Trans. Control Netw. Syst., vol. 8, no. 3, pp. 1093–1106, Sept. 2021. doi: 10.1109/TCNS.2021.3074244
    [75]
    Y. Jiang, B. Kiumarsi, J. Fan, T. Chai, J. Li, and F. Lewis, “Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3147–3156, Jul. 2020. doi: 10.1109/TCYB.2018.2890046
    [76]
    D. Wang, D. Liu, H. Li, B. Luo, and H. Ma, “An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 46, no. 5, pp. 713–717, May 2016. doi: 10.1109/TSMC.2015.2466191
    [77]
    J. Li, J. Ding, T. Chai, F. Lewis, and S. Jagannathan, “Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 1, pp. 270–280, Jan. 2022. doi: 10.1109/TNNLS.2020.3027653
    [78]
    N. S. Tripathy, I. N. Kar, and K. Paul, “Suboptimal robust stabilization of discrete-time mismatched nonlinear system,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 352–359, Jan. 2018. doi: 10.1109/JAS.2017.7510676
    [79]
    D. M. Adhyaru, I. N. Kar, and M. Gopal, “Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution,” IET Control Theory Appl., vol. 3, no. 9, pp. 1183–1195, Sept. 2009. doi: 10.1049/iet-cta.2008.0288
    [80]
    D. Wang, D. Liu, Q. Zhang, and D. Zhao, “Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 46, no. 11, pp. 1544–1555, Nov. 2016. doi: 10.1109/TSMC.2015.2492941
    [81]
    D. Wang and X. Xu, “A data-based neural policy learning strategy towards robust tracking control design for uncertain dynamic systems,” Int. J. Syst. Sci., vol. 53, no. 8, pp. 1719–1732, Jan. 2022. doi: 10.1080/00207721.2021.2023685
    [82]
    X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Ind. Electron., vol. 65, no. 7, pp. 5722–5732, Jul. 2018. doi: 10.1109/TIE.2017.2782205
    [83]
    D. Wang, “Robust policy learning control of nonlinear plants with case studies for a power system application,” IEEE Trans. Ind. Inf., vol. 16, no. 3, pp. 1733–1741, Mar. 2020. doi: 10.1109/TII.2019.2925632
    [84]
    D. Wang, “Intelligent critic control with robustness guarantee of disturbed nonlinear plants,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2740–2748, Jun. 2020. doi: 10.1109/TCYB.2019.2903117
    [85]
    C. Mu, Y. Zhang, Z. Gao, and C. Sun, “ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4056–4067, Nov. 2020. doi: 10.1109/TSMC.2019.2895692
    [86]
    B. Pang, T. Bian, and Z. P. Jiang, “Robust policy iteration for continuous-time linear quadratic regulation,” IEEE Trans. Autom. Control, vol. 67, no. 1, pp. 504–511, Jan. 2022. doi: 10.1109/TAC.2021.3085510
    [87]
    J. Hou, D. Wang, D. Liu, and Y. Zhang, “Model-free H optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4097–4108, Nov. 2020. doi: 10.1109/TSMC.2018.2863708
    [88]
    Y. Liu, H. Zhang, R. Yu, and Z. Xing, “H tracking control of discrete-time system with delays via data-based adaptive dynamic programming,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4078–4085, Nov. 2020. doi: 10.1109/TSMC.2019.2946397
    [89]
    D. Wang, L. Hu, M. Zhao, and J. Qiao, “Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 53, no. 3, pp. 1584–1595, Mar. 2023. doi: 10.1109/TSMC.2022.3201671
    [90]
    X. Zhong, H. He, D. Wang, and Z. Ni, “Model-free adaptive control for unknown nonlinear zero-sum differential game,” IEEE Trans. Cybern., vol. 48, no. 5, pp. 1633–1646, May 2018. doi: 10.1109/TCYB.2017.2712617
    [91]
    Y. Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 714–725, Mar. 2017. doi: 10.1109/TNNLS.2016.2561300
    [92]
    L. N. Tan, “Distributed H optimal tracking control for strict-feedback nonlinear large-scale systems with disturbances and saturating actuators,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4719–4731, Nov. 2020. doi: 10.1109/TSMC.2018.2861470
    [93]
    R. Song, Q. Wei, H. Zhang, and F. Lewis, “ Discrete-time non-zero-sum games with completely unknown dynamics,” IEEE Trans. Cybern., vol. 51, no. 6, pp. 2929–2943, Jun. 2021. doi: 10.1109/TCYB.2019.2957406
    [94]
    K. Raghavan, J. Sarangapani, and V. A. Samaranayake, “A game theoretic approach for addressing domain-shift in big-data,” IEEE Trans. Big Data, vol. 8, no. 6, pp. 1610–1621, Dec. 2022.
    [95]
    J. Li, Z. Xiao, J. Fan, T. Chai, and F. Lewis, “Off-policy Q-learning: Solving nash equilibrium of multi-player games with network-induced delay and unmeasured state,” Automatica, vol. 136, p. 110076, Feb. 2022. doi: 10.1016/j.automatica.2021.110076
    [96]
    Z. Peng, R. Luo, J. Hu, K. Shi, S. K. Nguang, and B. K. Ghosh, “Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 8, pp. 4043–4055, Aug. 2022. doi: 10.1109/TNNLS.2021.3055761
    [97]
    J. Li, J. Ding, T. Chai, and F. Lewis, “Nonzero-sum game reinforcement learning for performance optimization in large-scale industrial processes,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 4132–4145, Sept. 2020. doi: 10.1109/TCYB.2019.2950262
    [98]
    Y. Lv and X. Ren, “Approximate Nash solutions for multiplayer mixed-zero-sum game with reinforcement learning,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 49, no. 12, pp. 2739–2750, Dec. 2019. doi: 10.1109/TSMC.2018.2861826
    [99]
    H. Zhang, H. Su, K. Zhang, and Y. Luo, “Event-triggered adaptive dynamic programming for non-zero-sum games of unknown nonlinear systems via generalized fuzzy hyperbolic models,” IEEE Trans. Fuzzy Syst., vol. 27, no. 11, pp. 2202–2214, Nov. 2019. doi: 10.1109/TFUZZ.2019.2896544
    [100]
    X. Yang, Y. Zhou, N. Dong, and Q. Wei, “Adaptive critics for decentralized stabilization of constrained-input nonlinear interconnected systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 52, no. 7, pp. 4187–4199, Jul. 2022. doi: 10.1109/TSMC.2021.3089944
    [101]
    X. Yang, Z. Zeng, and Z. Gao, “Decentralized neurocontroller design with critic learning for nonlinear-interconnected systems,” IEEE Trans. Cybern., vol. 52, no. 11, pp. 11672–11685, Nov. 2022. doi: 10.1109/TCYB.2021.3085883
    [102]
    S. Tong, K. Sun, and S. Sui, “Observer-based adaptive fuzzy decentralized optimal control design for strict-feedback nonlinear large-scale systems,” IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 569–584, Apr. 2018. doi: 10.1109/TFUZZ.2017.2686373
    [103]
    J. Song, L. Y. Huang, H. R. Karimi, Y. Niu, and J. Zhou, “ADP-based security decentralized sliding mode control for partially unknown large-scale systems under injection attacks,” IEEE Trans. Circuits Syst. I: Regul. Pap., vol. 67, no. 12, pp. 5290–5301, Dec. 2020. doi: 10.1109/TCSI.2020.3014253
    [104]
    W. Wang, X. Chen, H. Fu, and M. Wu, “Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4123–4134, Nov. 2020. doi: 10.1109/TSMC.2018.2883801
    [105]
    H. Fu, X. Chen, and M. Wu, “Distributed optimal observer design of networked systems via adaptive critic design,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 11, pp. 6976–6985, Nov. 2021. doi: 10.1109/TSMC.2019.2962088
    [106]
    Z. Zuo, B. Tian, M. Defoort, and Z. Ding, “Fixed-time consensus tracking for multiagent systems with high-order integrator dynamics,” IEEE Trans. Autom. Control, vol. 63, no. 2, pp. 563–570, Feb. 2018. doi: 10.1109/TAC.2017.2729502
    [107]
    Y. Jiang, J. Fan, W. Gao, T. Chai, and F. Lewis, “Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems,” Automatica, vol. 121, p. 109149, Nov. 2020. doi: 10.1016/j.automatica.2020.109149
    [108]
    A. Sargolzaei, B. C. Allen, C. D. Crane, and W. E. Dixon, “Lyapunov-based control of a nonlinear multiagent system with a time-varying input delay under false-data-injection attacks,” IEEE Trans. Ind. Inf., vol. 18, no. 4, pp. 2693–2703, Apr. 2022. doi: 10.1109/TII.2021.3106009
    [109]
    J. Huang, Z. Zhang, F. Cai, and Y. Chen, “Optimized formation control for multi-agent systems based on adaptive dynamic programming without persistence of excitation,” IEEE Control Syst. Lett., vol. 6, pp. 1412–1417, Jul. 2022. doi: 10.1109/LCSYS.2021.3098964
    [110]
    W. Gao, Z. Jiang, F. Lewis, and Y. Wang, “Leader-to-formation stability of multiagent systems: An adaptive optimal control approach,” IEEE Trans. Autom. Control, vol. 63, no. 10, pp. 3581–3587, Oct. 2018. doi: 10.1109/TAC.2018.2799526
    [111]
    Y. Wang and S. Boyd, “Fast model predictive control using online optimization,” IEEE Trans. Control Syst. Technol., vol. 18, no. 2, pp. 267–278, Mar. 2010. doi: 10.1109/TCST.2009.2017934
    [112]
    D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” Eur. J. Control, vol. 11, no. 4–5, pp. 310–334, Dec. 2005. doi: 10.3166/ejc.11.310-334
    [113]
    L. Beckenbach, P. Osinenko, and S. Streif, “Addressing infinite-horizon optimization in MPC via Q-learning,” IFAC-PapersOnLine, vol. 51, no. 20, pp. 60–65, Jan. 2018. doi: 10.1016/j.ifacol.2018.10.175
    [114]
    C. Hu, L. Zhao, and G. Qu, “Event-triggered model predictive adaptive dynamic programming for road intersection path planning of unmanned ground vehicle,” IEEE Trans. Veh. Technol., vol. 70, no. 11, pp. 11228–11243, Nov. 2021. doi: 10.1109/TVT.2021.3111692
    [115]
    T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration,” in Proc. IEEE Conf. Decision and Control, Miami, USA, 2018, pp. 6059–6066.
    [116]
    D. Görges, “Relations between model predictive control and reinforcement learning,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 4920–4928, Jul. 2017. doi: 10.1016/j.ifacol.2017.08.747
    [117]
    E. Bøhn, S. Gros, S. Moe, and T. A. Johansen, “Optimization of the model predictive control update interval using reinforcement learning,” IFAC-PapersOnLine, vol. 54, no. 14, pp. 257–262, Sept. 2021. doi: 10.1016/j.ifacol.2021.10.362
    [118]
    M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Trans. Autom. Control, vol. 66, no. 8, pp. 3638–3652, Aug. 2021. doi: 10.1109/TAC.2020.3024161
    [119]
    X. Yang, H. Zhang, Z. Wang, H. Yan, and C. Zhang, “Data-based predictive control via multistep policy gradient reinforcement learning,” IEEE Trans. Cybern., vol. 53, no. 5, pp. 2818–2828, May 2023. doi: 10.1109/TCYB.2021.3121078
    [120]
    V. Kompella, R. Capobianco, S. Jong, J. Browne, S. Fox, L. Meyers, P. Wurman, and P. Stone, “Reinforcement learning for optimization of COVID-19 mitigation policies,” arXiv preprint arXiv: 2010.10560, 2020.
    [121]
    D. Wang, M.-M. Zhao, M.-M. Ha, and J.-F. Qiao, “Intelligent optimal tracking with application verifications via discounted generalized value iteration,” Acta Autom. Sinica, vol. 48, no. 1, pp. 182–193, Jan. 2022.
    [122]
    J. M. Lee and J. H. Lee, “Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes,” Automatica, vol. 41, no. 7, pp. 1281–1288, Jul. 2005. doi: 10.1016/j.automatica.2005.02.006
    [123]
    B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 9303–9311.
    [124]
    J. Lu, Q. Wei, and F.-Y. Wang, “Parallel control for optimal tracking via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1662–1674, Nov. 2020. doi: 10.1109/JAS.2020.1003426
    [125]
    C. J. C. H. Warkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3, pp. 279–292, May 1992.
    [126]
    X. Li, L. Dong, and C. Sun, “Data-based optimal tracking of autonomous nonlinear switching systems,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227–238, Jan. 2021. doi: 10.1109/JAS.2020.1003486
    [127]
    L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732
    [128]
    B. Pang, Z.-P. Jiang, and I. Mareels, “Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems,” Automatica, vol. 118, p. 109035, Aug. 2020. doi: 10.1016/j.automatica.2020.109035
    [129]
    R. Yang, D. Wang, and J. Qiao, “Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control,” IEEE Trans. Ind. Inf., vol. 18, no. 5, pp. 3150–3158, May 2022. doi: 10.1109/TII.2021.3106402
    [130]
    Q. Zhang and D. Zhao, “Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics,” IEEE Trans. Cybern., vol. 49, no. 8, pp. 2874–2885, Aug. 2019. doi: 10.1109/TCYB.2018.2830820
    [131]
    M. Pieters and M. A. Wiering, “Q-learning with experience replay in a dynamic environment,” in Proc. 2016 IEEE Symp. Series on Computational Intelligence, Athens, Greece, 2016, pp. 1–8.
    [132]
    W.-C. Jiang, K. S. Hwang, and J.-L. Lin, “An experience replay method based on tree structure for reinforcement learning,” IEEE Trans. Emerg. Top. Comput., vol. 9, no. 2, pp. 972–982, Apr.–Jun. 2021. doi: 10.1109/TETC.2018.2890682
    [133]
    B. Luo, Y. Yang, and D. Liu, “Adaptive Q-learning for data-based optimal output regulation with experience replay,” IEEE Trans. Cybern., vol. 48, no. 12, pp. 3337–3348, Dec. 2018. doi: 10.1109/TCYB.2018.2821369
    [134]
    S. Al-Dabooni and D. C. Wunsch, “Online model-free n-step HDP with stability analysis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1255–1269, Apr. 2020. doi: 10.1109/TNNLS.2019.2919614
    [135]
    F. J. Pineda, “Mean-field theory for batched TD(λ),” Neural Comput., vol. 9, no. 7, pp. 1403–1419, Oct. 1997. doi: 10.1162/neco.1997.9.7.1403
    [136]
    R. Yousefian and S. Kamalasadan, “Design and real-time implementation of optimal power system wide-area system-centric controller based on temporal difference learning,” IEEE Trans. Ind. Appl., vol. 52, no. 1, pp. 395–406, Jan.–Feb. 2016. doi: 10.1109/TIA.2015.2466622
    [137]
    S. Al-Dabooni and D. Wunsch, “The boundedness conditions for model-free HDP(λ),” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1928–1942, Jul. 2019. doi: 10.1109/TNNLS.2018.2875870
    [138]
    J. Ye, Y. Bian, B. Xu, Z. Qin, and M. Hu, “Online optimal control of discrete-time systems based on globalized dual heuristic programming with eligibility traces,” in Proc. 3rd Int. Conf. Industrial Artificial Intelligence, Shenyang, China, 2021, pp. 1–6.
    [139]
    N. Ab Azar, A. Shahmansoorian, and M. Davoudi, “From inverse optimal control to inverse reinforcement learning: A historical review,” Annu. Rev. Control, vol. 50, pp. 119–138, Jun. 2020. doi: 10.1016/j.arcontrol.2020.06.001
    [140]
    S. Adams, T. Cody, and P. A. Beling, “A survey of inverse reinforcement learning,” Artif. Intell. Rev., vol. 55, no. 6, pp. 4307–4346, Feb. 2022. doi: 10.1007/s10462-021-10108-x
    [141]
    B. Lian, W. Xue, F. Lewis, and T. Chai, “Inverse reinforcement learning for adversarial apprentice games,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 4596–4609, Aug. 2023. doi: 10.1109/TNNLS.2021.3114612
    [142]
    Y.-C. Bo and X. Zhang, “Online adaptive dynamic programming based on echo state networks for dissolved oxygen control,” Appl. Soft Comput. J., vol. 62, pp. 830–839, Jan. 2018. doi: 10.1016/j.asoc.2017.09.015
    [143]
    J. Qiao, L. Wang, and H. Han, “Optimal control for wastewater treatment process based on ESN neural network,” CAAI Trans. Intell. Syst., vol. 10, no. 6, pp. 831–837, Dec. 2015.
    [144]
    F. Hernandez-del-Olmo, E. Gaudioso, and A. Nevado, “Autonomous adaptive and active tuning up of the dissolved oxygen setpoint in a wastewater treatment plant using reinforcement learning,” IEEE Trans. Syst.,Man,Cybern.,Part C (Appl. Rev.), vol. 42, no. 5, pp. 768–774, Sept. 2012. doi: 10.1109/TSMCC.2011.2162401
    [145]
    D. Liu, Y. Xu, and X. Liu, “Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 36–46, Jan. 2018. doi: 10.1109/JAS.2017.7510739
    [146]
    Y. Zhu, D. Zhao, X. Li, and D. Wang, “Control-limited adaptive dynamic programming for multi-battery energy storage systems,” IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 4235–4244, Jul. 2019. doi: 10.1109/TSG.2018.2854300
    [147]
    Z. Wang, Y. Yu, W. Gao, M. Davari, and C. Deng, “Adaptive, optimal, virtual synchronous generator control of three-phase grid-connected inverters under different grid conditions — An adaptive dynamic programming approach,” IEEE Trans. Ind. Inf., vol. 18, no. 11, pp. 7388–7399, Nov. 2022. doi: 10.1109/TII.2021.3138893
    [148]
    D. Wang, H. He, C. Mu, and D. Liu, “Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system,” IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4935–4944, Jun. 2017. doi: 10.1109/TIE.2017.2674633
    [149]
    D. Liu, S. Baldi, W. Yu, and G. Chen, “On distributed implementation of switch-based adaptive dynamic programming,” IEEE Trans. Cybern., vol. 52, no. 7, pp. 7218–7224, Jul. 2022. doi: 10.1109/TCYB.2020.3029825
    [150]
    D. Liu, W. Yu, S. Baldi, J. Cao, and W. Huang, “A switching-based adaptive dynamic programming method to optimal traffic signaling,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 11, pp. 4160–4170, Nov. 2020. doi: 10.1109/TSMC.2019.2930138
    [151]
    Y. Wen, J. Si, A. Brandt, X. Gao, and H. Huang, “Online reinforcement learning control for the personalization of a robotic knee prosthesis,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2346–2356, Jun. 2020. doi: 10.1109/TCYB.2019.2890974
    [152]
    X. Han, Z. Zheng, L. Liu, B. Wang, Z. Cheng, H. Fan, and Y. Wang, “Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles,” Aerosp. Sci. Technol., vol. 106, p. 106233, Nov. 2020. doi: 10.1016/j.ast.2020.106233
    [153]
    S. Zhao, J. Wang, H. Xu, and B. Wang, “Composite observer-based optimal attitude-tracking control with reinforcement learning for hypersonic vehicles,” IEEE Trans. Cybern., vol. 53, no. 2, pp. 913–926, Feb. 2023. doi: 10.1109/TCYB.2022.3192871
    [154]
    Q. Wei, T. Li, and D. Liu, “Learning control for air conditioning systems via human expressions,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7662–7671, Aug. 2021. doi: 10.1109/TIE.2020.3001849
    [155]
    A. H. Hosseinloo, A. Ryzhov, A. Bischi, H. Ouerdane, K. Turitsyn, and M. A. Dahleh, “Data-driven control of micro-climate in buildings: An event-triggered reinforcement learning approach,” Appl. Energy, vol. 277, p. 115451, Nov. 2020. doi: 10.1016/j.apenergy.2020.115451
    [156]
    T. T. Nguyen and V. J. Reddi, “Deep reinforcement learning for cyber security,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 3779–3795, Aug. 2023. doi: 10.1109/TNNLS.2021.3121870
    [157]
    X. Wang, D. Ding, X. Ge, and Q.-L. Han, “Neural-network-based control for discrete-time nonlinear systems with denial-of-service attack: The adaptive event-triggered case,” Int. J. Robust Nonlinear Control, vol. 32, no. 5, pp. 2760–2779, Mar. 2022. doi: 10.1002/rnc.5831
    [158]
    X. Wang, D. Ding, H. Dong, and X.-M. Zhang, “Neural-network-based control for discrete-time nonlinear systems with input saturation under stochastic communication protocol,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 4, pp. 766–778, Apr. 2021. doi: 10.1109/JAS.2021.1003922
    [159]
    R. Liu, F. Hao, and H. Yu, “Optimal SINR-based DoS attack scheduling for remote state estimation via adaptive dynamic programming approach,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 12, pp. 7622–7632, Dec. 2021. doi: 10.1109/TSMC.2020.2981478
    [160]
    L. Zhang, Y. Chen, and M. Li, “ADP-based remote secure control for networked control systems under unknown nonlinear attacks in sensors and actuators,” IEEE Trans. Ind. Inf., vol. 18, no. 9, pp. 6003–6014, Sept. 2022. doi: 10.1109/TII.2021.3137816

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(3)

    Article Metrics

    Article views (1093) PDF downloads(426) Cited by()

    Highlights

    • This paper provides a comprehensive survey of recent developments in adaptive dynamic programming / reinforcement learning (ADP/RL) and its applications in various advanced control fields. Extensive work is summarized for discrete-time and continuous-time systems, respectively
    • The work on the integration of ADP/RL with methods and problems such as event-triggered control, robust stabilization, gaming design, distributed system design and model predictive control is discussed. The ADP architecture is also revisited from a data-driven and RL perspective, demonstrating the broad applicability of ADP
    • Several typical control applications of RL and ADP are summarized, especially in the areas of wastewater treatment processes and power systems. In addition, a general prospect of future research is given for the combination of ADP with approximation error analysis, high-dimensional system control, advanced networking techniques and communication protocols, brain-like intelligence and parallel control

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return