 Volume 6
							Issue 2
								
						 Volume 6
							Issue 2 
						IEEE/CAA Journal of Automatica Sinica
| Citation: | Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390 | 
 
	                | [1] | M. L. Minsky, "Theory of neural-analog reinforcement systems and its application to the brain model problem, " Ph.D. dissertation, Princeton University, 1954. | 
| [2] | D. P. Bertsekas,  "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, 2019. doi:  10.1109/JAS.2018.7511249 | 
| [3] | F. L. Lewis and D. Liu,  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ: John Wiley & Sons, Inc., 2013. | 
| [4] | J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: Wiley-IEEE Press, 2004. | 
| [5] | R. S. Sutton and A. G. Barto,  Reinforcement Learning: An Introduction (2nd edition). Cambridge, MA: The MIT Press, 2018. | 
| [6] | A. G. Barto, P. S. Tomas, and R. S. Sutton, "Some recent applications of reinforcement learning, " in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017. | 
| [7] | P. W. Glimcher, "Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis, " in Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15 647-15 654, 2011. http://www.ncbi.nlm.nih.gov/pubmed/21389268 | 
| [8] | B. Lorica. (2017, Dec.) Practical applications of reinforcement learning in industry: An overview of commercial and industrial applications of reinforcement learning.[Online]. Available: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry | 
| [9] | D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis,  "Mastering the game of go without human knowledge, " Nature, vol. 550, no. 7676, pp. 354-359, 10 2017. doi:  10.1038/nature24270 | 
| [10] | L. C. Baird, Ⅲ, "Reinforcement learning in continuous time: advantage updating, " in Proceedings of the International Conference on Neural Networks, vol. 4, Jun. 1994, pp. 2448-2453. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=374604 | 
| [11] | K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, Jan. 2000. http://www.ncbi.nlm.nih.gov/pubmed/10636940 | 
| [12] | K. Doya, K. Samejima, K.-i. Katagiri, and M. Kawato,  "Multiple model- based reinforcement learning, " Neural Computation, vol. 14, no. 6, pp. 1347-1369, 2002. doi:  10.1162/089976602753712972 | 
| [13] | E. A. Theodorou, J. Buchli, and S. Schaal, "A generalized path inte- gral control approach to reinforcement learning, " Journal of Machine Learning Research, vol. 11, pp. 3137-3181, Dec. 2010. | 
| [14] | H. van Hasselt, "Reinforcement learning in continuous state and action spaces, " Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. Springer Berlin Heidelberg, 2012, vol. 12, ch. Chapter 7, pp. 207-251. http://www.springerlink.com/content/wt12078728654511/ | 
| [15] | H. van Hasselt and M. A. Wiering, "Reinforcement learning in contin- uous action spaces, " in Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 2007, pp. 272-279. doi:  10.1007%2F978-3-642-27645-3_7?LI=true | 
| [16] | S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actor-critic algorithms, " Automatica, vol. 45, no. 11, pp. 2471-2482, 2009. doi:  10.1016/j.automatica.2009.07.008 | 
| [17] | A. Nedic and D. P. Bertsekas,  "Least squares policy evaluation algo- rithms with linear function approximation, " Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003. | 
| [18] | R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Proc. Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2000, pp. 1057-1063. | 
| [19] | J. N. Tsitsiklis,  "Asynchronous stochastic approximation and Q- learning, " Machine Learning, vol. 16, no. 3, pp. 185-202, 1994. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC025688365/ | 
| [20] | J. N. Tsitsiklis and B. Van Roy,  "An analysis of temporal-difference learning with function approximation, " IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997. doi:  10.1109/9.580874 | 
| [21] | Y. Jiang and Z. P. Jiang,  Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley-IEEE Press, 2017. | 
| [22] | R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. E. Dixon, Rein-forcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Cham, 2018. | 
| [23] | D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Pro- gramming with Applications in Optimal Control. Springer International Publishing, 2017. | 
| [24] | D. Wang, H. He, and D. Liu,  "Adaptive critic nonlinear robust control: A survey, " IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017. doi:  10.1109/TCYB.2017.2712188 | 
| [25] | S. J. Russell and P. Norvig,  Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Pearson Education, 2010. | 
| [26] | T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0005109814003392 | 
| [27] | T. Bian and Z. P. Jiang,  "Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design, " Automatica, vol. 71, pp. 348-360, 2016. doi:  10.1016/j.automatica.2016.05.003 | 
| [28] | Y. Jiang and Z. P. Jiang,  "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi:  10.1016/j.automatica.2012.06.096 | 
| [29] | Y. Jiang and Z. P. Jiang, "Robust adaptive dynamic programming and feedback stabiliza- tion of nonlinear systems, " IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-893, May 2014. | 
| [30] | T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming for stochastic systems with state and control dependent noise, " IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4170-4175, Dec. 2016. http://ieeexplore.ieee.org/document/7447723/ | 
| [31] | T. Bian, Y. Jiang, and Z. P. Jiang, "Stochastic and adaptive optimal control of uncertain intercon- nected systems: A data-driven approach, " Systems & Control Letters, vol. 115, no. 5, pp. 48-54, May 2018. | 
| [32] | T. Bian, Y. Jiang, and Z. P. Jiang, "Continuous-time robust dynamic programming, " ArXiv e-prints, vol. abs/1804.04577, 2018.[Online]. Available: https://arxiv.org/abs/1809.05867 | 
| [33] | Z. P. Jiang and Y. Jiang,  "Robust adaptive dynamic programming for linear and nonlinear systems: An overview, " European Journal of Control, vol. 19, no. 5, pp. 417-425, 2013. doi:  10.1016/j.ejcon.2013.05.017 | 
| [34] | H. V. Henderson and S. R. Searle,  "Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, " Canadian Journal of Statistics, vol. 7, no. 1, pp. 65-81, 1979. doi:  10.2307/3315017 | 
| [35] | D. Liberzon,  Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ: Princeton University Press, 2012. | 
| [36] | S. S. Haykin,  Adaptive Filter Theory (5th edition). Harlow, UK: Pearson Education, 2014. | 
| [37] | M. Krstic' and H. Deng, Stabilization of Nonlinear Uncertain Systems. Springer-Verlag New York, 1998. | 
| [38] | E. D. Sontag, "Input to state stability: Basic concepts and results, " Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy June 19-29, 2004, P. Nistri and G. Stefani, Eds. Springer Berlin Heidelberg, 2008, pp. 163-220. doi:  10.1007%2F978-3-540-77653-6_3 | 
| [39] | E. D. Sontag, "Smooth stabilization implies coprime factorization, " IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435-443, Apr 1989. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=28018 | 
| [40] | R. W. Erickson and D. Maksimovic', Fundamentals of Power Electronics (2nd edition). Springer US, 2001. |