A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 6 Issue 2
Mar.  2019

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390
Citation: Tao Bian and Zhong-Ping Jiang, "Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach," IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433-440, Mar. 2019. doi: 10.1109/JAS.2019.1911390

Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

doi: 10.1109/JAS.2019.1911390
Funds:

the National Science Foundation ECCS-1230040

the National Science Foundation ECCS-1501044

More Information
  • In this paper, we introduce a novel reinforcement learning (RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms, an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in real-world applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.

     

  • loading
  • [1]
    M. L. Minsky, "Theory of neural-analog reinforcement systems and its application to the brain model problem, " Ph.D. dissertation, Princeton University, 1954.
    [2]
    D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, 2019. doi: 10.1109/JAS.2018.7511249
    [3]
    F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Piscataway, NJ: John Wiley & Sons, Inc., 2013.
    [4]
    J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: Wiley-IEEE Press, 2004.
    [5]
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (2nd edition). Cambridge, MA: The MIT Press, 2018.
    [6]
    A. G. Barto, P. S. Tomas, and R. S. Sutton, "Some recent applications of reinforcement learning, " in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017.
    [7]
    P. W. Glimcher, "Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis, " in Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15 647-15 654, 2011. http://www.ncbi.nlm.nih.gov/pubmed/21389268
    [8]
    B. Lorica. (2017, Dec.) Practical applications of reinforcement learning in industry: An overview of commercial and industrial applications of reinforcement learning.[Online]. Available: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
    [9]
    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge, " Nature, vol. 550, no. 7676, pp. 354-359, 10 2017. doi: 10.1038/nature24270
    [10]
    L. C. Baird, Ⅲ, "Reinforcement learning in continuous time: advantage updating, " in Proceedings of the International Conference on Neural Networks, vol. 4, Jun. 1994, pp. 2448-2453. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=374604
    [11]
    K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, Jan. 2000. http://www.ncbi.nlm.nih.gov/pubmed/10636940
    [12]
    K. Doya, K. Samejima, K.-i. Katagiri, and M. Kawato, "Multiple model- based reinforcement learning, " Neural Computation, vol. 14, no. 6, pp. 1347-1369, 2002. doi: 10.1162/089976602753712972
    [13]
    E. A. Theodorou, J. Buchli, and S. Schaal, "A generalized path inte- gral control approach to reinforcement learning, " Journal of Machine Learning Research, vol. 11, pp. 3137-3181, Dec. 2010.
    [14]
    H. van Hasselt, "Reinforcement learning in continuous state and action spaces, " Reinforcement Learning: State-of-the-Art, M. Wiering and M. van Otterlo, Eds. Springer Berlin Heidelberg, 2012, vol. 12, ch. Chapter 7, pp. 207-251. http://www.springerlink.com/content/wt12078728654511/
    [15]
    H. van Hasselt and M. A. Wiering, "Reinforcement learning in contin- uous action spaces, " in Proc. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 2007, pp. 272-279. doi: 10.1007%2F978-3-642-27645-3_7?LI=true
    [16]
    S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actor-critic algorithms, " Automatica, vol. 45, no. 11, pp. 2471-2482, 2009. doi: 10.1016/j.automatica.2009.07.008
    [17]
    A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algo- rithms with linear function approximation, " Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003.
    [18]
    R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation, " in Proc. Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2000, pp. 1057-1063.
    [19]
    J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q- learning, " Machine Learning, vol. 16, no. 3, pp. 185-202, 1994. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC025688365/
    [20]
    J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation, " IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997. doi: 10.1109/9.580874
    [21]
    Y. Jiang and Z. P. Jiang, Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley-IEEE Press, 2017.
    [22]
    R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. E. Dixon, Rein-forcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Cham, 2018.
    [23]
    D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Pro- gramming with Applications in Optimal Control. Springer International Publishing, 2017.
    [24]
    D. Wang, H. He, and D. Liu, "Adaptive critic nonlinear robust control: A survey, " IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017. doi: 10.1109/TCYB.2017.2712188
    [25]
    S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Pearson Education, 2010.
    [26]
    T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0005109814003392
    [27]
    T. Bian and Z. P. Jiang, "Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design, " Automatica, vol. 71, pp. 348-360, 2016. doi: 10.1016/j.automatica.2016.05.003
    [28]
    Y. Jiang and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
    [29]
    Y. Jiang and Z. P. Jiang, "Robust adaptive dynamic programming and feedback stabiliza- tion of nonlinear systems, " IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-893, May 2014.
    [30]
    T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming for stochastic systems with state and control dependent noise, " IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4170-4175, Dec. 2016. http://ieeexplore.ieee.org/document/7447723/
    [31]
    T. Bian, Y. Jiang, and Z. P. Jiang, "Stochastic and adaptive optimal control of uncertain intercon- nected systems: A data-driven approach, " Systems & Control Letters, vol. 115, no. 5, pp. 48-54, May 2018.
    [32]
    T. Bian, Y. Jiang, and Z. P. Jiang, "Continuous-time robust dynamic programming, " ArXiv e-prints, vol. abs/1804.04577, 2018.[Online]. Available: https://arxiv.org/abs/1809.05867
    [33]
    Z. P. Jiang and Y. Jiang, "Robust adaptive dynamic programming for linear and nonlinear systems: An overview, " European Journal of Control, vol. 19, no. 5, pp. 417-425, 2013. doi: 10.1016/j.ejcon.2013.05.017
    [34]
    H. V. Henderson and S. R. Searle, "Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, " Canadian Journal of Statistics, vol. 7, no. 1, pp. 65-81, 1979. doi: 10.2307/3315017
    [35]
    D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ: Princeton University Press, 2012.
    [36]
    S. S. Haykin, Adaptive Filter Theory (5th edition). Harlow, UK: Pearson Education, 2014.
    [37]
    M. Krstic' and H. Deng, Stabilization of Nonlinear Uncertain Systems. Springer-Verlag New York, 1998.
    [38]
    E. D. Sontag, "Input to state stability: Basic concepts and results, " Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy June 19-29, 2004, P. Nistri and G. Stefani, Eds. Springer Berlin Heidelberg, 2008, pp. 163-220. doi: 10.1007%2F978-3-540-77653-6_3
    [39]
    E. D. Sontag, "Smooth stabilization implies coprime factorization, " IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435-443, Apr 1989. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=28018
    [40]
    R. W. Erickson and D. Maksimovic', Fundamentals of Power Electronics (2nd edition). Springer US, 2001.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)  / Tables(1)

    Article Metrics

    Article views (2779) PDF downloads(182) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return