A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 6 Issue 2
Mar.  2019

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390
Citation: Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390

Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

doi: 10.1109/JAS.2019.1911390

the National Science Foundation ECCS-1230040

the National Science Foundation ECCS-1501044

More Information
  • In this paper, we introduce a novel reinforcement learning (RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms, an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in real-world applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.


  • loading
  • [1]
    M. L. Minsky, "Theory of neural-analog reinforcement systems and its application to the brain model problem, " Ph.D. dissertation, Princeton University, 1954.
    D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, 2019. doi: 10.1109/JAS.2018.7511249
    F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ: John Wiley & Sons, Inc., 2013.
    J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: Wiley-IEEE Press, 2004.
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (2nd edition). Cambridge, MA: The MIT Press, 2018.
    A. G. Barto, P. S. Tomas, and R. S. Sutton, "Some recent applications of reinforcement learning, " in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017.
    P. W. Glimcher, "Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis, " in Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15 647-15 654, 2011. http://www.ncbi.nlm.nih.gov/pubmed/21389268
    B. Lorica. (2017, Dec.) Practical applications of reinforcement learning in industry: An overview of commercial and industrial applications of reinforcement learning.[Online]. Available: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge, " Nature, vol. 550, no. 7676, pp. 354-359, 10 2017. doi: 10.1038/nature24270
    L. C. Baird, Ⅲ, "Reinforcement learning in continuous time: advantage updating, " in Proceedings of the International Conference on Neural Networks, vol. 4, Jun. 1994, pp. 2448-2453. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=374604
    K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, Jan. 2000. http://www.ncbi.nlm.nih.gov/pubmed/10636940
    K. Doya, K. Samejima, K.-i. Katagiri, and M. Kawato, "Multiple model- based reinforcement learning, " Neural Computation, vol. 14, no. 6, pp. 1347-1369, 2002. doi: 10.1162/089976602753712972
    E. A. Theodorou, J. Buchli, and S. Schaal, "A generalized path inte- gral control approach to reinforcement learning, " Journal of Machine Learning Research, vol. 11, pp. 3137-3181, Dec. 2010.
    H. van Hasselt, "Reinforcement learning in continuous state and action spaces, " Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. Springer Berlin Heidelberg, 2012, vol. 12, ch. Chapter 7, pp. 207-251. http://www.springerlink.com/content/wt12078728654511/
    H. van Hasselt and M. A. Wiering, "Reinforcement learning in contin- uous action spaces, " in Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 2007, pp. 272-279. doi: 10.1007%2F978-3-642-27645-3_7?LI=true
    S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actor-critic algorithms, " Automatica, vol. 45, no. 11, pp. 2471-2482, 2009. doi: 10.1016/j.automatica.2009.07.008
    A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algo- rithms with linear function approximation, " Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003.
    R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Proc. Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2000, pp. 1057-1063.
    J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q- learning, " Machine Learning, vol. 16, no. 3, pp. 185-202, 1994. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC025688365/
    J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation, " IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997. doi: 10.1109/9.580874
    Y. Jiang and Z. P. Jiang, Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley-IEEE Press, 2017.
    R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. E. Dixon, Rein-forcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Cham, 2018.
    D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Pro- gramming with Applications in Optimal Control. Springer International Publishing, 2017.
    D. Wang, H. He, and D. Liu, "Adaptive critic nonlinear robust control: A survey, " IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017. doi: 10.1109/TCYB.2017.2712188
    S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Pearson Education, 2010.
    T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0005109814003392
    T. Bian and Z. P. Jiang, "Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design, " Automatica, vol. 71, pp. 348-360, 2016. doi: 10.1016/j.automatica.2016.05.003
    Y. Jiang and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
    Y. Jiang and Z. P. Jiang, "Robust adaptive dynamic programming and feedback stabiliza- tion of nonlinear systems, " IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-893, May 2014.
    T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming for stochastic systems with state and control dependent noise, " IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4170-4175, Dec. 2016. http://ieeexplore.ieee.org/document/7447723/
    T. Bian, Y. Jiang, and Z. P. Jiang, "Stochastic and adaptive optimal control of uncertain intercon- nected systems: A data-driven approach, " Systems & Control Letters, vol. 115, no. 5, pp. 48-54, May 2018.
    T. Bian, Y. Jiang, and Z. P. Jiang, "Continuous-time robust dynamic programming, " ArXiv e-prints, vol. abs/1804.04577, 2018.[Online]. Available: https://arxiv.org/abs/1809.05867
    Z. P. Jiang and Y. Jiang, "Robust adaptive dynamic programming for linear and nonlinear systems: An overview, " European Journal of Control, vol. 19, no. 5, pp. 417-425, 2013. doi: 10.1016/j.ejcon.2013.05.017
    H. V. Henderson and S. R. Searle, "Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, " Canadian Journal of Statistics, vol. 7, no. 1, pp. 65-81, 1979. doi: 10.2307/3315017
    D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ: Princeton University Press, 2012.
    S. S. Haykin, Adaptive Filter Theory (5th edition). Harlow, UK: Pearson Education, 2014.
    M. Krstic' and H. Deng, Stabilization of Nonlinear Uncertain Systems. Springer-Verlag New York, 1998.
    E. D. Sontag, "Input to state stability: Basic concepts and results, " Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy June 19-29, 2004, P. Nistri and G. Stefani, Eds. Springer Berlin Heidelberg, 2008, pp. 163-220. doi: 10.1007%2F978-3-540-77653-6_3
    E. D. Sontag, "Smooth stabilization implies coprime factorization, " IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435-443, Apr 1989. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=28018
    R. W. Erickson and D. Maksimovic', Fundamentals of Power Electronics (2nd edition). Springer US, 2001.


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)  / Tables(1)

    Article Metrics

    Article views (2711) PDF downloads(181) Cited by()


    DownLoad:  Full-Size Img  PowerPoint