Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

Tao Bian; Zhong-Ping Jiang

doi:10.1109/JAS.2019.1911390

Volume 6 Issue 2

Mar. 2019

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2019 > 6(2): 433-440

Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390

Citation:

Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390

Citation:

PDF( 1114 KB)

Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

doi: 10.1109/JAS.2019.1911390

Tao Bian^{1
,
,},
Zhong-Ping Jiang^2
,

1.
Bank of America Merrill Lynch, One Bryant Park, New York, NY 10036 USA
2.
Control and Networks Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, 5 Metrotech Center, Brooklyn, NY 11201 USA

Funds:

the National Science Foundation ECCS-1230040

the National Science Foundation ECCS-1501044

More Information

Author Bio:
Tao Bian received the B.Eng. degree in automation from Huazhong University of Science and Technology, Wuhan, China, in 2012, and the M.Sc. and Ph.D. degrees in electrical engineering from New York University, NY, in 2014 and 2017, respectively. Currently, he is an Assistant Vice President; quantitative finance analyst at Bank of America Merrill Lynch.
His research interests include reinforcement learning, adaptive optimal control, and control of stochastic systems. Dr. Bian is the winner of the Steve and Rosalind Hsia Best Biomedical Paper Award (with his advisor Prof. Jiang) at the 2016 World Congress on Intelligent Control and Automation in Guilin, China

Zhong-Ping Jiang(F'08) received the B.Sc. degree in mathematics from the University of Wuhan, Wuhan, China, in 1988, the M.Sc. degree in statistics from the University of Paris XI, France, in 1989, and the Ph.D. degree in automatic control and mathematics from the Ecole des Mines de Paris (now, called Paris Tech-Mines), France, in 1993, under the direction of Prof. Laurent Praly.
Currently, he is a Professor of Electrical and Computer Engineering at the Tandon School of Engineering, New York University. His main research interests include stability theory, robust/adaptive/distributed nonlinear control, adaptive dynamic programming and their applications to information, mechanical and biological systems. He is coauthor of the books Stability and Stabilization of Nonlinear Systems (with Dr. I. Karafyllis, Springer, 2011), Nonlinear Control of Dynamic Networks (with Drs. T. Liu and D.J. Hill, Taylor & Francis, 2014), Robust Adaptive Dynamic Programming (with Y. Jiang, Wiley-IEEE Press, 2017), and Nonlinear Control Under Information Constraints (with T. Liu, Science Press, 2018). He was named a Highly Cited Researcher by Web of Science in 2018.
Dr. Jiang is a Deputy co-Editor-in-Chief of the Journal of Control and Decision, a Senior Editor of the IEEE Control Systems Letters and of the Systems and Control Letters, an Editor for the International Journal of Robust and Nonlinear Control and has served as an Associate Editor for several journals including Mathematics of Control, Signals and Systems(MCSS), Systems & Control Letters, IEEE Transactions on Automatic Control, European Journal of Control, and Science China: Information Sciences. Dr. Jiang is a recipient of the prestigious Queen Elizabeth Ⅱ Fellowship Award from the Australian Research Council (1998), the CAREER Award from the U.S. National Science Foundation (2001), JSPS Invitation Fellowship from the Japan Society for the Promotion of Science (2005), the Distinguished Overseas Chinese Scholar Award from the NSF of China (2007), and the Chair Professorship by the Ministry of Education of China (2009).
Prof. Jiang is a Fellow of the IEEE and a Fellow of the IFAC. (e-mail: zjiang@nyu.edu)
Corresponding author: Tao Bian, e-mail: tbian@nyu.edu
Received Date: 2018-10-08
Accepted Date: 2018-12-24

Abstract

Abstract

In this paper, we introduce a novel reinforcement learning (RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms, an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in real-world applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.
- Adaptive optimal control,
- robust dynamic programming,
- value iteration (Ⅵ)

FullText(HTML)

References(40)

References

[1]	M. L. Minsky, "Theory of neural-analog reinforcement systems and its application to the brain model problem, " Ph.D. dissertation, Princeton University, 1954.
[2]	D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, 2019. doi: 10.1109/JAS.2018.7511249
[3]	F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ: John Wiley & Sons, Inc., 2013.
[4]	J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: Wiley-IEEE Press, 2004.
[5]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (2nd edition). Cambridge, MA: The MIT Press, 2018.
[6]	A. G. Barto, P. S. Tomas, and R. S. Sutton, "Some recent applications of reinforcement learning, " in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017.
[7]	P. W. Glimcher, "Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis, " in Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15 647-15 654, 2011. http://www.ncbi.nlm.nih.gov/pubmed/21389268
[8]	B. Lorica. (2017, Dec.) Practical applications of reinforcement learning in industry: An overview of commercial and industrial applications of reinforcement learning.[Online]. Available: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
[9]	D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge, " Nature, vol. 550, no. 7676, pp. 354-359, 10 2017. doi: 10.1038/nature24270
[10]	L. C. Baird, Ⅲ, "Reinforcement learning in continuous time: advantage updating, " in Proceedings of the International Conference on Neural Networks, vol. 4, Jun. 1994, pp. 2448-2453. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=374604
[11]	K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, Jan. 2000. http://www.ncbi.nlm.nih.gov/pubmed/10636940
[12]	K. Doya, K. Samejima, K.-i. Katagiri, and M. Kawato, "Multiple model- based reinforcement learning, " Neural Computation, vol. 14, no. 6, pp. 1347-1369, 2002. doi: 10.1162/089976602753712972
[13]	E. A. Theodorou, J. Buchli, and S. Schaal, "A generalized path inte- gral control approach to reinforcement learning, " Journal of Machine Learning Research, vol. 11, pp. 3137-3181, Dec. 2010.
[14]	H. van Hasselt, "Reinforcement learning in continuous state and action spaces, " Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. Springer Berlin Heidelberg, 2012, vol. 12, ch. Chapter 7, pp. 207-251. http://www.springerlink.com/content/wt12078728654511/
[15]	H. van Hasselt and M. A. Wiering, "Reinforcement learning in contin- uous action spaces, " in Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 2007, pp. 272-279. doi: 10.1007%2F978-3-642-27645-3_7?LI=true
[16]	S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actor-critic algorithms, " Automatica, vol. 45, no. 11, pp. 2471-2482, 2009. doi: 10.1016/j.automatica.2009.07.008
[17]	A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algo- rithms with linear function approximation, " Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003.
[18]	R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Proc. Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2000, pp. 1057-1063.
[19]	J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q- learning, " Machine Learning, vol. 16, no. 3, pp. 185-202, 1994. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC025688365/
[20]	J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation, " IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997. doi: 10.1109/9.580874
[21]	Y. Jiang and Z. P. Jiang, Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley-IEEE Press, 2017.
[22]	R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. E. Dixon, Rein-forcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Cham, 2018.
[23]	D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Pro- gramming with Applications in Optimal Control. Springer International Publishing, 2017.
[24]	D. Wang, H. He, and D. Liu, "Adaptive critic nonlinear robust control: A survey, " IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017. doi: 10.1109/TCYB.2017.2712188
[25]	S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Pearson Education, 2010.
[26]	T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0005109814003392
[27]	T. Bian and Z. P. Jiang, "Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design, " Automatica, vol. 71, pp. 348-360, 2016. doi: 10.1016/j.automatica.2016.05.003
[28]	Y. Jiang and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
[29]	Y. Jiang and Z. P. Jiang, "Robust adaptive dynamic programming and feedback stabiliza- tion of nonlinear systems, " IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-893, May 2014.
[30]	T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming for stochastic systems with state and control dependent noise, " IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4170-4175, Dec. 2016. http://ieeexplore.ieee.org/document/7447723/
[31]	T. Bian, Y. Jiang, and Z. P. Jiang, "Stochastic and adaptive optimal control of uncertain intercon- nected systems: A data-driven approach, " Systems & Control Letters, vol. 115, no. 5, pp. 48-54, May 2018.
[32]	T. Bian, Y. Jiang, and Z. P. Jiang, "Continuous-time robust dynamic programming, " ArXiv e-prints, vol. abs/1804.04577, 2018.[Online]. Available: https://arxiv.org/abs/1809.05867
[33]	Z. P. Jiang and Y. Jiang, "Robust adaptive dynamic programming for linear and nonlinear systems: An overview, " European Journal of Control, vol. 19, no. 5, pp. 417-425, 2013. doi: 10.1016/j.ejcon.2013.05.017
[34]	H. V. Henderson and S. R. Searle, "Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, " Canadian Journal of Statistics, vol. 7, no. 1, pp. 65-81, 1979. doi: 10.2307/3315017
[35]	D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ: Princeton University Press, 2012.
[36]	S. S. Haykin, Adaptive Filter Theory (5th edition). Harlow, UK: Pearson Education, 2014.
[37]	M. Krstic' and H. Deng, Stabilization of Nonlinear Uncertain Systems. Springer-Verlag New York, 1998.
[38]	E. D. Sontag, "Input to state stability: Basic concepts and results, " Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy June 19-29, 2004, P. Nistri and G. Stefani, Eds. Springer Berlin Heidelberg, 2008, pp. 163-220. doi: 10.1007%2F978-3-540-77653-6_3
[39]	E. D. Sontag, "Smooth stabilization implies coprime factorization, " IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435-443, Apr 1989. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=28018
[40]	R. W. Erickson and D. Maksimovic', Fundamentals of Power Electronics (2nd edition). Springer US, 2001.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views (2987) PDF downloads(191)

Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

doi: 10.1109/JAS.2019.1911390

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content