A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 7
Jul.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
M. M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
Citation: M. M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692

Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control

doi: 10.1109/JAS.2022.105692
Funds:  This work was supported in part by Beijing Natural Science Foundation (JQ19013), the National Key Research and Development Program of China (2021ZD0112302), and the National Natural Science Foundation of China (61773373)
More Information
  • The core task of tracking control is to make the controlled plant track a desired trajectory. The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases. In this paper, a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem. Unlike the regulator problem, the iterative value function of tracking control problem cannot be regarded as a Lyapunov function. A novel stability analysis method is developed to guarantee that the tracking error converges to zero. The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated. Finally, the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.

     

  • loading
  • [1]
    D. Liu, Q. L. Wei, D. Wang, X. Yang, and H. L. Li, Adaptive Dynamic Programming with Applications in Optimal Control. Cham, Germany: Springer, 2017.
    [2]
    D. Liu and Q. L. Wei, “Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 43, no. 2, pp. 779–789, Apr. 2013. doi: 10.1109/TSMCB.2012.2216523
    [3]
    D. Wang, J. F. Qiao, and L. Cheng, “An approximate neuro-optimal solution of discounted guaranteed cost control design,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 77–86, Jan. 2022. doi: 10.1109/TCYB.2020.2977318
    [4]
    Q. L. Wei and D. Liu, “A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176–1190, Oct. 2014. doi: 10.1109/TASE.2013.2280974
    [5]
    M. M. Ha, D. Wang, and D. Liu, “Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 50, no. 9, pp. 3158–3168, Sept. 2020. doi: 10.1109/TSMC.2018.2868510
    [6]
    D. Wang, M. M. Ha, and M. M. Zhao, “The intelligent critic framework for advanced optimal control,” Artif. Intell. Rev., vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
    [7]
    M. M. Ha, D. Wang, and D. Liu, “Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems,” Inf. Sci., vol. 519, pp. 110–123, May 2020. doi: 10.1016/j.ins.2020.01.020
    [8]
    D. Wang, H. B. He, and D. Liu, “Adaptive critic nonlinear robust control: A survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017. doi: 10.1109/TCYB.2017.2712188
    [9]
    Q. L. Wei, D. Liu, Y. Liu, and R. Z. Song, “Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 168–176, Apr. 2017. doi: 10.1109/JAS.2016.7510262
    [10]
    D. Liu, Y. C. Xu, Q. L. Wei, and X. L. Liu, “Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 36–46, Jan. 2018. doi: 10.1109/JAS.2017.7510739
    [11]
    A. Heydari and S. N. Balakrishnan, “Adaptive critic-based solution to an orbital rendezvous problem,” J. Guid.,Control,Dyn., vol. 37, no. 1, pp. 344–350, Jan. 2014. doi: 10.2514/1.60553
    [12]
    A. Heydari, “Theoretical and numerical analysis of approximate dynamic programming with approximation errors,” J. Guid.,Control,Dyn., vol. 39, no. 2, pp. 301–311, Feb. 2016. doi: 10.2514/1.G001154
    [13]
    D. Wang, M. M. Ha, and J. F. Qiao, “Data-driven iterative adaptive critic control toward an urban wastewater treatment plant,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7362–7369, Aug. 2021. doi: 10.1109/TIE.2020.3001840
    [14]
    X. Han, Z. Z. Zheng, L. Liu, B. Wang, Z. T. Cheng, H. J. Fan, and Y. J. Wang, “Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles,” Aerosp. Sci. Technol., vol. 106, p. 106233, Nov. 2020.
    [15]
    F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, Jan. 2009. doi: 10.1109/MCAS.2009.933854
    [16]
    F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
    [17]
    H. Li and D. Liu, “Optimal control for discrete-time affine non-linear systems using general value iteration,” IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, Dec. 2012. doi: 10.1049/iet-cta.2011.0783
    [18]
    Q. L. Wei, D. Liu, and H. Q. Lin, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
    [19]
    D. Wang and X. N. Zhong, “Advanced policy learning near-optimal regulation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 743–749, May 2019. doi: 10.1109/JAS.2019.1911489
    [20]
    A. Heydari, “Stability analysis of optimal adaptive control using value iteration with approximation errors,” IEEE Trans. Autom. Control, vol. 63, no. 9, pp. 3119–3126, Sept. 2018. doi: 10.1109/TAC.2018.2790260
    [21]
    D. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syet., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
    [22]
    D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syet., vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
    [23]
    D. Liu, D. Wang, and H. L. Li, “Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418–428, Feb. 2014. doi: 10.1109/TNNLS.2013.2280013
    [24]
    A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst.,Man,Cybern. Part B Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
    [25]
    D. Liu, X. Yang, D. Wang, and Q. L. Wei, “Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints,” IEEE Trans. Cybern., vol. 45, no. 7, pp. 1372–1385, Jul. 2015. doi: 10.1109/TCYB.2015.2417170
    [26]
    D. Liu, S. Xue, B. Zhao, B. Luo, and Q. L. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Trans. Syst.,Man,Cybern.: Syst., vol. 51, no. 1, pp. 142–160, Jan. 2021. doi: 10.1109/TSMC.2020.3042876
    [27]
    B. Lincoln and A. Rantzer, “Relaxing dynamic programming,” IEEE Trans. Autom. Control, vol. 51, no. 8, pp. 1249–1260, Aug. 2006. doi: 10.1109/TAC.2006.878720
    [28]
    A. Heydari, “Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522–4527, Sept. 2018. doi: 10.1109/TNNLS.2017.2755501
    [29]
    D. Wang, D. Liu, Q. L. Wei, D. B. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, p. 1825–1832, Aug. 2012.
    [30]
    M. M. Ha, D. Wang, and D. Liu. “Generalized value iteration for discounted optimal control with stability analysis,” Syst. Control Lett., vol. 147, p. 104847, Jan. 2021.
    [31]
    D. Wang, M. M. Ha, and J. F. Qiao, “Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation,” IEEE Trans. Autom. Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. doi: 10.1109/TAC.2019.2926167
    [32]
    H. G. Zhang, Q. L. Wei, and Y. H. Luo, “A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm,” IEEE Trans. Syst. Man,Cybern. Part B Cybern., vol. 38, no. 4, pp. 937–942, Aug. 2008. doi: 10.1109/TSMCB.2008.920269
    [33]
    B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M. B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, Apr. 2014. doi: 10.1016/j.automatica.2014.02.015
    [34]
    B. Kiumarsi and F. L. Lewis, “Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 140–151, Jan. 2015. doi: 10.1109/TNNLS.2014.2358227
    [35]
    Y. Cao, Y. D. Song, and C. Y. Wen, “Practical tracking control of perturbed uncertain nonaffine systems with full state constraints,” Automatica, vol. 110, p. 108608, Dec. 2019.
    [36]
    M. M. Ha, D. Wang, and D. Liu, “Data-based nonaffine optimal tracking control using iterative DHP approach,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 4246–4251, Jul. 2020. doi: 10.1016/j.ifacol.2020.12.2473
    [37]
    M. M. Ha, D. Wang, and D. Liu, “Value-iteration-based neuro-optimal tracking control for affine systems with completely unknown dynamics,” in Proc. 39th Chinese Control Conf., Shenyang, China, 2020, pp. 1951–1956.
    [38]
    D. Wang, D. Liu, and Q. L. Wei, “Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach,” Neurocomputing, vol. 78, no. 1, pp. 14–22, Feb. 2012. doi: 10.1016/j.neucom.2011.03.058
    [39]
    B. Kiumarsi, F. L. Lewis, M. B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems using input-output measured data,” IEEE Trans. Cybern., vol. 45, no. 12, pp. 2770–2779, Dec. 2015. doi: 10.1109/TCYB.2014.2384016
    [40]
    L. Liu, Z. S. Wang, and H. G. Zhang, “Neural-network-based robust optimal tracking control for MIMO discrete-time systems with unknown uncertainty using adaptive critic design,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1239–1251, Apr. 2018. doi: 10.1109/TNNLS.2017.2660070
    [41]
    B. Luo, D. Liu, T. W. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 10, pp. 2134–2144, Oct. 2016. doi: 10.1109/TNNLS.2016.2585520
    [42]
    C. Li, J. L. Ding, F. L. Lewis, and T. Y. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, p. 109687, Jul. 2021.
    [43]
    H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. doi: 10.1016/j.automatica.2014.05.011
    [44]
    C. B. Qin, H. G. Zhang, and Y. H. Luo, “Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming,” Int. J. Control, vol. 87, no. 5, pp. 1000–1009, May 2014. doi: 10.1080/00207179.2013.863432
    [45]
    R. Kamalapurkar, H. Dinhb, S. Bhasin, and W. E. Dixon, “Approximate optimal trajectory tracking for continuous-time nonlinear systems,” Automatica, vol. 51, pp. 40–48, Jan. 2015. doi: 10.1016/j.automatica.2014.10.103
    [46]
    C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. L. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
    [47]
    X. N. Zhong, Z. Ni, and H. B. He, “A theoretical foundation of goal representation heuristic dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 12, pp. 2513–2525, Dec. 2016. doi: 10.1109/TNNLS.2015.2490698

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)

    Article Metrics

    Article views (587) PDF downloads(139) Cited by()

    Highlights

    • The core findings
    • In this paper, based on the new performance index function, a novel stability analysis method for the tracking control problem is established. It is guaranteed that the tracking error can be eliminated completely. The effect of the presence of the approximation errors derived from the value function approximator is discussed with respect to the stability of controlled systems. For linear systems, the new VI-based adaptive critic scheme between the kernel matrix and the state feedback gain is developed
    • The essence of the research
    • Optimal tracking control is a significant topic in the control community. Some tracking control methods solve the feedforward control of the reference trajectory and transform the tracking control problem into a regulator problem. However, the feedforward control input might be nonexistent. The others establish a cost function of the tracking error and control input. The tracking error cannot be eliminated. It is necessary to adopt the new cost function and develop a novel stability analysis method to guarantee that the tracking error converges to zero
    • The distinction of the paper
    • This paper adopts a new performance index function to develop the value-iteration-based adaptive critic framework to solve the tracking control problem. Unlike the regulator problem, the iterative value function of tracking control problem cannot be regarded as a Lyapunov function. A novel stability analysis method is developed to guarantee that the tracking error can be eliminated completely. Besides, the effect of the presence of the approximation errors derived from the value function approximator is discussed with respect to the stability of controlled systems

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return