A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 1
Jan.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
A. Perrusquía and W. Guo, “Optimal control of nonlinear systems using experience inference human-behavior learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 90–102, Jan. 2023. doi: 10.1109/JAS.2023.123009
Citation: A. Perrusquía and W. Guo, “Optimal control of nonlinear systems using experience inference human-behavior learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 90–102, Jan. 2023. doi: 10.1109/JAS.2023.123009

Optimal Control of Nonlinear Systems Using Experience Inference Human-Behavior Learning

doi: 10.1109/JAS.2023.123009
Funds:  This work was supported by the Royal Academy of Engineering and the Office of the Chief Science Adviser for National Security under the UK Intelligence Community Postdoctoral Research Fellowship programme
More Information
  • Safety critical control is often trained in a simulated environment to mitigate risk. Subsequent migration of the biased controller requires further adjustments. In this paper, an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems. The approach is inspired in the complementary properties that exhibits the hippocampus, the neocortex, and the striatum learning systems located in the brain. The hippocampus defines a physics informed reference model of the real-world nonlinear system for experience inference and the neocortex is the adaptive dynamic programming (ADP) or reinforcement learning (RL) algorithm that ensures optimal performance of the reference model. This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model. Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory. Simulation studies are carried out to verify the approach.

     

  • loading
  • [1]
    D. Liberzon, Calculus of Variations and Optimal Control Theory. Princeton university press, 2011.
    [2]
    F. L. Lewis, Optimal Control. New York, NY, USA: Wiley, 2012.
    [3]
    A. Perrusquía and W. Yu, “Discrete-time H2 neural control using reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, pp. 1–11, 2020.
    [4]
    J.-H. Kim and F. Lewis, “Model-free h control design for unknown linear discrete-time systems via Q-learining with LMI,” Automatica, vol. 46, pp. 1320–1326, 2010. doi: 10.1016/j.automatica.2010.05.002
    [5]
    F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control using natural decision methods to design optimal adaptive controllers,” IEEE Conrol Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012.
    [6]
    Z.-P. Jiang, T. Bian, and W. Gao, “Learning-based control: A tutorial and some recent results,” Foundations and Trends® in Systems and Control, vol. 8, no. 3, 2020.
    [7]
    S. Tu and B. Recht, “The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,” in Proc. Conf. Learning Theory. PMLR, 2019, pp. 3036–3083.
    [8]
    M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite horizon-discounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015. doi: 10.1109/TCYB.2014.2322116
    [9]
    B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018. doi: 10.1109/TNNLS.2017.2773458
    [10]
    B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpor, and M.-B. NaghibiSistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, pp. 1167–1175, 2014. doi: 10.1016/j.automatica.2014.02.015
    [11]
    H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, 2014. doi: 10.1016/j.automatica.2013.09.043
    [12]
    H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning,” IEEE Trans. Automatic Control, vol. 59, no. 11, pp. 3051–3056, 2014. doi: 10.1109/TAC.2014.2317301
    [13]
    A. Perrusquía and W. Yu, “Robot position/force control in unknown environment using hybrid reinforcement learning,” Cybernetics and Systems, vol. 51, no. 4, pp. 542–560, 2020. doi: 10.1080/01969722.2020.1758466
    [14]
    Q. Xie, B. Luo, and F. Tan, “Discrete-time LQR optimal tracking control problems using approximate dynamic programming algorithm with disturbance,” in Proc. IEEE 4th Int. Conf. Intelligent Control and Information Processing, 2013, pp. 716–721.
    [15]
    D. Vrabie and F. L. Lewis, “Neural networks approach for continuoustime direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, pp. 237–246, 2009. doi: 10.1016/j.neunet.2009.03.008
    [16]
    L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, 2010.
    [17]
    R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
    [18]
    A. Perrusquía and W. Yu, “Neural H2 control using continuous-time reinforcement learning,” IEEE Trans. Cybernetics, pp. 1–10, 2020.
    [19]
    M. Wiering and M. van Otterlo, Reinforcement Learning: State-of-Art. Springer, 2012.
    [20]
    I. Grondman, L. Buşoniu, G. A. Lopes, and R. Babǔska, “A survey of actor-critic reinforcement learning: Standard and natural policy gradients,” IEEE Trans. Systems,Man,and Cybernetics,PART C, vol. 42, no. 6, pp. 1291–1307, 2012. doi: 10.1109/TSMCC.2012.2218595
    [21]
    A. Perrusquía and W. Yu, “Continuous-time reinforcement learning for robust control under worst-case uncertainty,” Int. Journal of Systems Science, vol. 52, no. 4, pp. 770–784, 2021. doi: 10.1080/00207721.2020.1839142
    [22]
    A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. System,Man,and Cybernetics Part B,Cybernetics, vol. 38, no. 4, pp. 943–949, 2008. doi: 10.1109/TSMCB.2008.926614
    [23]
    A. Gheibi, A. Ghiasi, S. Ghaemi, and M. Badamchizadeh, “Designing of robust adaptive passivity-based controller based on reinforcement learning for nonlinear port-Hamiltonian model with disturbance,” Int. Journal of Control, vol. 93, no. 8, pp. 1754–1764, 2020. doi: 10.1080/00207179.2018.1532607
    [24]
    M. Tomás-Rodríguez and S. P. Banks, Linear, Time-Varying Approximations to Nonlinear Dynamical Systems: With Applications in Control and Optimization. Springer Science & Business Media, 2010, vol. 400.
    [25]
    C. Wang, Y. Li, S. Sam Ge, and T. Heng Lee, “Optimal critic learning for robot control in time-varying environments,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2301–2310, 2015. doi: 10.1109/TNNLS.2014.2378812
    [26]
    R. Kamalapurkar, Walters, and W. Dixon, “Model-based reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016. doi: 10.1016/j.automatica.2015.10.039
    [27]
    A. Perrusquía, W. Yu, and A. Soria, “Position/force control of robot manipulators using reinforcement learning,” Industrial Robot: The International Jounral of Robotics Research and Application, vol. 46, no. 2, pp. 267–280, 2019. doi: 10.1108/IR-10-2018-0209
    [28]
    H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control. London, U.K.: Springer-Verlag, 2013.
    [29]
    B. Kiumarsi and F. L. Lewis, “Actor-critic based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 1, pp. 140–151, 2015. doi: 10.1109/TNNLS.2014.2358227
    [30]
    C. Mu, Z. Ni, C. Sun, and H. He, “Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 584–598, 2016.
    [31]
    C. Mu, Z. Ni, C. Sun, and H. He, “Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,” IEEE Trans. Cybernetics, vol. 47, no. 6, pp. 1460–1470, 2016.
    [32]
    B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” arXiv preprint arXiv: 2008.11592, 2020.
    [33]
    K. G. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems &Control Letters, pp. 14–20, 2017.
    [34]
    F. L. Lewis, S. Jagannathan, and A. Yeşildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor & Francis, 1999.
    [35]
    A. Perrusquía and W. Yu, “Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview,” Neurocomputing, vol. 438, pp. 145–154, 2021. doi: 10.1016/j.neucom.2021.01.096
    [36]
    J. Young Lee, J. B. Park, and Y. H. Choi, “Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 5, 2015.
    [37]
    A. Perrusquía, W. Yu, and X. Li, “Multi-agent reinforcement learning for redundant robot control in task-space,” Int. Journal of Machine Learning and Cybernetics, vol. 12, no. 1, pp. 231–241, 2021. doi: 10.1007/s13042-020-01167-7
    [38]
    V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
    [39]
    A. Perrusquía and W. Yu, “Robust control under worst-case uncertainty for unknown nonlinear systems using modified reinforcement learning,” Int. Journal of Robust and Nonlinear Control, vol. 30, no. 7, pp. 2920–2936, 2020. doi: 10.1002/rnc.4911
    [40]
    J. Ramírez, W. Yu, and A. Perrusquía, “Model-free reinforcement learning from expert demonstrations: A survey,” Artificial Intelligence Review, pp. 1–29, 2021.
    [41]
    A. Perrusquía, W. Yu, and X. Li, “Nonlinear control using human behavior learning,” Information Sciences, vol. 569, pp. 358–375, 2021. doi: 10.1016/j.ins.2021.03.043
    [42]
    B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, 2017.
    [43]
    D. Kumaran, D. Hassabis, and J. L. McClelland, “What learning systems do intelligent agents need? Complementary learning systems theory updated” Trends in Cognitive Sciences, vol. 20, no. 7, pp. 512–534, 2016. doi: 10.1016/j.tics.2016.05.004
    [44]
    A. Perrusquía, “A complementary learning approach for expertise transference of human-optimized controllers,” Neural Networks, vol. 145, pp. 33–41, 2021.
    [45]
    R. C. O’Reilly, R. Bhattacharyya, M. D. Howard, and N. Ketz, “Complementary learning systems,” Cognitive Science, vol. 38, no. 6, pp. 1229–1248, 2014. doi: 10.1111/j.1551-6709.2011.01214.x
    [46]
    H. F. Ólafsdóttir, D. Bush, and C. Barry, “The role of hippocampal replay in memory and planning,” Current Biology, vol. 28, no. 1, pp. R37–R50, 2018. doi: 10.1016/j.cub.2017.10.073
    [47]
    M. G. Mattar and N. D. Daw, “Prioritized memory access explains planning and hippocampal replay,” Nature Neuroscience, vol. 21, no. 11, pp. 1609–1617, 2018. doi: 10.1038/s41593-018-0232-z
    [48]
    A. Perrusquía, “Human-behavior learning: A new complementary learning perspective for optimal decision making controllers,” Neurocomputing, 2022.
    [49]
    A. Vilà-Balló, E. Mas-Herrero, Ripollés, et al., “Unraveling the role of the hippocampus in reversal learning,” Journal of Neuroscience, vol. 37, no. 28, pp. 6686–6697, 2017. doi: 10.1523/JNEUROSCI.3212-16.2017
    [50]
    K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman, “The hippocampus as a predictive map,” Nature Neuroscience, vol. 20, no. 11, pp. 1643–1653, 2017. doi: 10.1038/nn.4650
    [51]
    S. Blakeman and D. Mareschal, “A complementary learning systems approach to temporal difference learning,” Neural Networks, vol. 122, pp. 218–230, 2020. doi: 10.1016/j.neunet.2019.10.011
    [52]
    W. Schultz, Apicella, E. Scarnati, and T. Ljungberg, “Neuronal activity in monkey ventral striatum related to the expectation of reward,” Journal of Neuroscience, vol. 12, no. 12, pp. 4595–4610, 1992. doi: 10.1523/JNEUROSCI.12-12-04595.1992
    [53]
    J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory,” Psychological Review, vol. 102, no. 3, p. 419, 1995.
    [54]
    K. Vamvoudakis and F. L. Lewis, “On-line actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, pp. 878–888, 2010. doi: 10.1016/j.automatica.2010.02.018
    [55]
    T. Cimen, “Survey of state-dependent riccati equation in nonlinear optimal feedback control synthesis,” Journal of Guidance,Control,and Dynamics, vol. 35, no. 4, pp. 1025–1047, 2012. doi: 10.2514/1.55821
    [56]
    N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive control design for nonlinear systems,” in Proc. IEEE XXIV Int. Conf. Information, Communication and Automation Technologies, 2013, pp. 1–8.
    [57]
    N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive stabilization of nonlinear systems with application to cancer treatment,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1296–1301, 2014. doi: 10.3182/20140824-6-ZA-1003.02282
    [58]
    S. R. Nekoo, “Model reference adaptive state-dependent Riccati equation control of nonlinear uncertain systems: Regulation and tracking of free-floating space manipulators,” Aerospace Science and Technology, vol. 84, pp. 348–360, 2019. doi: 10.1016/j.ast.2018.10.005
    [59]
    N. T. Nguyen, “Model-reference adaptive control,” in Model-Reference Adaptive Control. Springer, 2018, pp. 83–123.
    [60]
    J. R. Cloutier, D. T. Stansbery, and M. Sznaier, “On the recoverability of nonlinear state feedback laws by extended linearization control techniques,” in Proc. IEEE American Control Conf. (Cat. No. 99CH36251), vol. 3, 1999, pp. 1515–1519.
    [61]
    H. Modares, F. L. Lewis, and Z.-P. Jiang, “H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749
    [62]
    C.-Y. Lee and J.-J. Lee, “Adaptive control for uncertain nonlinear systems based on multiple neural networks,” IEEE Trans. Systems Man and Cybernetics Part B, vol. 34, no. 1, pp. 325–333, 2004. doi: 10.1109/TSMCB.2003.811520
    [63]
    D. Luviano and W. Yu, “Continuous-time path planning for multi-agents with fuzzy reinforcement learning,” Journal of Intelligent &Fuzzy Systems, vol. 33, pp. 491–501, 2017.
    [64]
    W. Yu and A. Perrusquía, “Simplified stable admittance control using end-effector orientations,” Int. Journal of Social Robotics, vol. 12, no. 5, pp. 1061–1073, 2020. doi: 10.1007/s12369-019-00579-y
    [65]
    M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control. John Wiley & Sons, 2020.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(1)

    Article Metrics

    Article views (566) PDF downloads(108) Cited by()

    Highlights

    • The paper gives a solution to the migration problem from simulations to real-world experiments to ensure safety critical control
    • A novel optimal control solution of nonlinear systems based on a human-behaviour learning approach
    • The algorithm does not require to solve a HJB equation and does not require knowledge of the real parameters of the system
    • The final control policy applied to the real system is unbiased to simulation constraints

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return