IEEE/CAA Journal of Automatica Sinica
Citation: | Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390 |
[1] |
M. L. Minsky, "Theory of neural-analog reinforcement systems and its application to the brain model problem, " Ph.D. dissertation, Princeton University, 1954.
|
[2] |
D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, 2019. doi: 10.1109/JAS.2018.7511249
|
[3] |
F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ: John Wiley & Sons, Inc., 2013.
|
[4] |
J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: Wiley-IEEE Press, 2004.
|
[5] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (2nd edition). Cambridge, MA: The MIT Press, 2018.
|
[6] |
A. G. Barto, P. S. Tomas, and R. S. Sutton, "Some recent applications of reinforcement learning, " in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017.
|
[7] |
P. W. Glimcher, "Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis, " in Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15 647-15 654, 2011. http://www.ncbi.nlm.nih.gov/pubmed/21389268
|
[8] |
B. Lorica. (2017, Dec.) Practical applications of reinforcement learning in industry: An overview of commercial and industrial applications of reinforcement learning.[Online]. Available: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
|
[9] |
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge, " Nature, vol. 550, no. 7676, pp. 354-359, 10 2017. doi: 10.1038/nature24270
|
[10] |
L. C. Baird, Ⅲ, "Reinforcement learning in continuous time: advantage updating, " in Proceedings of the International Conference on Neural Networks, vol. 4, Jun. 1994, pp. 2448-2453. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=374604
|
[11] |
K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, Jan. 2000. http://www.ncbi.nlm.nih.gov/pubmed/10636940
|
[12] |
K. Doya, K. Samejima, K.-i. Katagiri, and M. Kawato, "Multiple model- based reinforcement learning, " Neural Computation, vol. 14, no. 6, pp. 1347-1369, 2002. doi: 10.1162/089976602753712972
|
[13] |
E. A. Theodorou, J. Buchli, and S. Schaal, "A generalized path inte- gral control approach to reinforcement learning, " Journal of Machine Learning Research, vol. 11, pp. 3137-3181, Dec. 2010.
|
[14] |
H. van Hasselt, "Reinforcement learning in continuous state and action spaces, " Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. Springer Berlin Heidelberg, 2012, vol. 12, ch. Chapter 7, pp. 207-251. http://www.springerlink.com/content/wt12078728654511/
|
[15] |
H. van Hasselt and M. A. Wiering, "Reinforcement learning in contin- uous action spaces, " in Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 2007, pp. 272-279. doi: 10.1007%2F978-3-642-27645-3_7?LI=true
|
[16] |
S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actor-critic algorithms, " Automatica, vol. 45, no. 11, pp. 2471-2482, 2009. doi: 10.1016/j.automatica.2009.07.008
|
[17] |
A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algo- rithms with linear function approximation, " Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003.
|
[18] |
R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Proc. Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2000, pp. 1057-1063.
|
[19] |
J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q- learning, " Machine Learning, vol. 16, no. 3, pp. 185-202, 1994. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC025688365/
|
[20] |
J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation, " IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997. doi: 10.1109/9.580874
|
[21] |
Y. Jiang and Z. P. Jiang, Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley-IEEE Press, 2017.
|
[22] |
R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. E. Dixon, Rein-forcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Cham, 2018.
|
[23] |
D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Pro- gramming with Applications in Optimal Control. Springer International Publishing, 2017.
|
[24] |
D. Wang, H. He, and D. Liu, "Adaptive critic nonlinear robust control: A survey, " IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017. doi: 10.1109/TCYB.2017.2712188
|
[25] |
S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Pearson Education, 2010.
|
[26] |
T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0005109814003392
|
[27] |
T. Bian and Z. P. Jiang, "Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design, " Automatica, vol. 71, pp. 348-360, 2016. doi: 10.1016/j.automatica.2016.05.003
|
[28] |
Y. Jiang and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
|
[29] |
Y. Jiang and Z. P. Jiang, "Robust adaptive dynamic programming and feedback stabiliza- tion of nonlinear systems, " IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-893, May 2014.
|
[30] |
T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming for stochastic systems with state and control dependent noise, " IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4170-4175, Dec. 2016. http://ieeexplore.ieee.org/document/7447723/
|
[31] |
T. Bian, Y. Jiang, and Z. P. Jiang, "Stochastic and adaptive optimal control of uncertain intercon- nected systems: A data-driven approach, " Systems & Control Letters, vol. 115, no. 5, pp. 48-54, May 2018.
|
[32] |
T. Bian, Y. Jiang, and Z. P. Jiang, "Continuous-time robust dynamic programming, " ArXiv e-prints, vol. abs/1804.04577, 2018.[Online]. Available: https://arxiv.org/abs/1809.05867
|
[33] |
Z. P. Jiang and Y. Jiang, "Robust adaptive dynamic programming for linear and nonlinear systems: An overview, " European Journal of Control, vol. 19, no. 5, pp. 417-425, 2013. doi: 10.1016/j.ejcon.2013.05.017
|
[34] |
H. V. Henderson and S. R. Searle, "Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, " Canadian Journal of Statistics, vol. 7, no. 1, pp. 65-81, 1979. doi: 10.2307/3315017
|
[35] |
D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ: Princeton University Press, 2012.
|
[36] |
S. S. Haykin, Adaptive Filter Theory (5th edition). Harlow, UK: Pearson Education, 2014.
|
[37] |
M. Krstic' and H. Deng, Stabilization of Nonlinear Uncertain Systems. Springer-Verlag New York, 1998.
|
[38] |
E. D. Sontag, "Input to state stability: Basic concepts and results, " Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy June 19-29, 2004, P. Nistri and G. Stefani, Eds. Springer Berlin Heidelberg, 2008, pp. 163-220. doi: 10.1007%2F978-3-540-77653-6_3
|
[39] |
E. D. Sontag, "Smooth stabilization implies coprime factorization, " IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435-443, Apr 1989. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=28018
|
[40] |
R. W. Erickson and D. Maksimovic', Fundamentals of Power Electronics (2nd edition). Springer US, 2001.
|