IEEE/CAA Journal of Automatica Sinica
Citation: | D. Wang, J. Y. Wang, M. M. Zhao, P. Xin, and J. F. Qiao, “Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1797–1809, Sept. 2023. doi: 10.1109/JAS.2023.123684 |
[1] |
D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
|
[2] |
N. A. Barriga, M. Stanescu, F. Besoain, and M. Buro, “Improving RTS game AI by supervised policy learning, tactical search, and deep reinforcement learning,” IEEE Computational Intelligence Magazine, vol. 14, no. 3, pp. 8–18, Aug. 2019. doi: 10.1109/MCI.2019.2919363
|
[3] |
Y. Jiang, Z. Cao, and J. Zhang, “Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming,” IEEE Trans. Cybernetics, vol. 53, no. 5, pp. 2864–2875, May 2023.
|
[4] |
D. Wang, H. He, X. Zhong, and D. Liu, “Event-driven nonlinear discounted optimal regulation involving a power system application,” IEEE Trans. Industrial Elecctronics, vol. 64, no. 10, pp. 8177–8186, Oct. 2017. doi: 10.1109/TIE.2017.2698377
|
[5] |
E. Foruzan, L. K. Soh, and S. Asgarpoor, “Reinforcement learning approach for optimal distributed energy management in a microgrid,” IEEE Trans. Power Systems, vol. 33, no. 5, pp. 5749–5758, Sept. 2018. doi: 10.1109/TPWRS.2018.2823641
|
[6] |
D. Chen, K. Chen, Z. Li, and T. Chu, et al., “PowerNet: Multi-agent deep reinforcement learning for scalable powergrid control,” IEEE Trans. Power Systems, vol. 37, no. 2, pp. 1007–1017, Mar. 2022. doi: 10.1109/TPWRS.2021.3100898
|
[7] |
D. Wang, M. Ha, and M. Zhao, “The intelligent critic framework for advanced optimal control,” Artificial Intelligence Review, vol. 55, no. 1, pp. 1–22, Jan. 2022. doi: 10.1007/s10462-021-10118-9
|
[8] |
M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Apr. 2022. doi: 10.1109/JAS.2022.105692
|
[9] |
Y. Li, Y. Liu, and S. Tong, “Observer-based neuro-adaptive optimized control of strict-feedback nonlinear systems with state constraints,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 7, pp. 3131–3145, Jul. 2022. doi: 10.1109/TNNLS.2021.3051030
|
[10] |
H. Wang, C. Yang, X. Liu, and L. Zhou, “Neural-network-based adaptive control of uncertain MIMO singularly perturbed systems with full-state constraints,” IEEE Trans. Neural Networks and Learning Systems, 2022.
|
[11] |
C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Automatic Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
|
[12] |
S. Song, M. Zhu, X. Dai, and D. Gong, “Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm,” IEEE Trans. Neural Networks Learning Systems, 2022.
|
[13] |
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Hoboken, USA: Wiley, 2007.
|
[14] |
D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 997–1007, Sept. 1997. doi: 10.1109/72.623201
|
[15] |
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, USA: Van Nostrand Reinhold, 1992, ch. 13.
|
[16] |
D. P. Bertsekas, Dynamic Programming and Optimal Control Chapter, vol. 1, Belmont, USA: Athena Scientific, 1995.
|
[17] |
R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, USA: MIT Press, 1998.
|
[18] |
J. Si and Y. T. Wang, “On-line learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
|
[19] |
S. Al-Dabooni and D. C. Wunsch, “Online model-free n-step HDP with stability analysis,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 4, pp. 1255–1269, Apr. 2020. doi: 10.1109/TNNLS.2019.2919614
|
[20] |
B. Luo, D. Liu, T. Huang, X. Yang, and H. Ma, “Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems,” Information Sciences, vol. 411, pp. 66–83, May 2017. doi: 10.1016/j.ins.2017.05.005
|
[21] |
B. Luo, D. Liu, T. Huang, and J. Liu, “Output tracking control based on adaptive dynamic programming with multistep policy evaluation,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 49, no. 10, pp. 2155–2165, Oct. 2019. doi: 10.1109/TSMC.2017.2771516
|
[22] |
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning, Jun. 2016, vol. 48, pp. 1928–1937.
|
[23] |
Q. Wei, L. Wang, Y. Liu, and M. M. Polycarpou, “Optimal elevator group control via deep asynchronous actor-critic learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5245–5256, Dec. 2020. doi: 10.1109/TNNLS.2020.2965208
|
[24] |
M. Ha, D. Wang, and D. Liu, “Offline and online adaptive critic control designs with stability guarantee through value iteration,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13262–13274, Dec. 2022.
|
[25] |
D. Wang, M. Zhao, M. Ha, and L. Hu, “Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems,” Engineering Applications of Artificial Intelligence, vol. 105, pp. 104443: 1–11, Dec. 2020.
|
[26] |
A. Roy, V. Borkar, A. Karandikar, and P. Chaporkar, “Online reinforcement learning of optimal threshold policies for Markov decision processes,” IEEE Trans. Automatic Control, vol. 67, no. 7, pp. 3722–3729, Jul. 2022. doi: 10.1109/TAC.2021.3108121
|
[27] |
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Systems,Man,and Cybernetics-Part B: Cybernetics, vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
|
[28] |
D. Wang, M. Ha, and J. Qiao, “Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation,” IEEE Trans. Automatic Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. doi: 10.1109/TAC.2019.2926167
|
[29] |
M. Ha, D. Wang, and D. Liu, “A novel value iteration scheme with adjustable convergence rate,” IEEE Trans. Neural Networks and Learning Systems, 2022.
|
[30] |
B. Luo, Y. Yang, H. N. Wu, and T. Huang, “Balancing value iteration and policy iteration for discrete-time control,” IEEE Trans. Syst.,Man,and Cybernetics: Syst., vol. 50, no. 11, pp. 3948–3958, Nov. 2020. doi: 10.1109/TSMC.2019.2898389
|
[31] |
D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
|
[32] |
D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Systems,Man,Cybernetics: Systems, vol. 45, no. 12, pp. 1577–1591, Dec. 2015. doi: 10.1109/TSMC.2015.2417510
|
[33] |
D. Bertsekas, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 249–272, Feb. 2021. doi: 10.1109/JAS.2021.1003814
|
[34] |
Q. Wei and D. Liu, “Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,” IEEE Trans. Cybernetics, vol. 46, no. 3, pp. 840–853, Mar. 2016. doi: 10.1109/TCYB.2015.2492242
|
[35] |
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009. doi: 10.1109/MCAS.2009.933854
|
[36] |
H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method,” IEEE Trans. Neural Networks, vol. 22, no. 12, pp. 2226–2236, Dec. 2011. doi: 10.1109/TNN.2011.2168538
|
[37] |
D. Wang, M. Zhao, M. Ha, and J. Qiao, “Stability and admissibility analysis for zero-sum games under general value iteration formulation,” IEEE Trans. Neural Networks and Learning Systems, 2022.
|
[38] |
M. Ha, D. Wang, and D. Liu, “Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee,” Neural Networks, vol. 144, pp. 176–186, 2021. doi: 10.1016/j.neunet.2021.08.025
|
[39] |
A. Heydari, “Stability analysis of optimal adaptive control using value iteration with approximation errors,” IEEE Trans. Automatic Control, vol. 63, no. 9, pp. 3119–3126, Sept. 2018. doi: 10.1109/TAC.2018.2790260
|
[40] |
M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 52, no. 6, pp. 3692–3703, Jun. 2022. doi: 10.1109/TSMC.2021.3071968
|
[41] |
R. S. Sutton and A. G. Barto, “Eligibility traces,” in Reinforcement Learning: An Introduction, Cambridge, USA: pp.163–192, 1998.
|
[42] |
T. Landelius, “Reinforcement learning and distributed local model synthesis,” Ph.D. dissertation, Linkoping Univ., Linkoping, Sweden, 1997.
|
[43] |
D. P. Iracleous and A. T. Alexandridis, “A multi-task automatic generation control for power regulation,” Electric Power Systems Research, vol. 73, no. 3, pp. 275–285, Mar. 2005. doi: 10.1016/j.jpgr.2004.06.011
|