A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 1
Jan.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Y. L. Yang, Z. H. Ding, R. Wang, H. Modares, and D. C. Wunsch, “Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 47–63, Jan. 2022. doi: 10.1109/JAS.2021.1004258
Citation: Y. L. Yang, Z. H. Ding, R. Wang, H. Modares, and D. C. Wunsch, “Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 47–63, Jan. 2022. doi: 10.1109/JAS.2021.1004258

Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning

doi: 10.1109/JAS.2021.1004258
Funds:  This work was supported in part by the National Natural Science Foundation of China (61903028), the Youth Innovation Promotion Association, Chinese Academy of Sciences (2020137), the Lifelong Learning Machines Program from DARPA/Microsystems Technology Office, and the Army Research Laboratory (W911NF-18-2-0260)
More Information
  • In this paper, we present a novel data-driven design method for the human-robot interaction (HRI) system, where a given task is achieved by cooperation between the human and the robot. The presented HRI controller design is a two-level control design approach consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design. The task-oriented design minimizes the human effort and guarantees the perfect task tracking in the outer-loop, while the plant-oriented achieves the desired impedance from the human to the robot manipulator end-effector in the inner-loop. Data-driven reinforcement learning techniques are used for performance optimization in the outer-loop to assign the optimal impedance parameters. In the inner-loop, a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement. On this basis, an adaptive controller is designed to achieve the desired impedance of the robot manipulator in the task space. The simulation and experiment of a robot manipulator are conducted to verify the efficacy of the presented HRI design framework.

     

  • loading
  • [1]
    E. Nuño, R. Ortega, and L. Basañez, “An adaptive controller for nonlinear teleoperators,” Automatica, vol. 46, no. 1, pp. 155–159, 2010. doi: 10.1016/j.automatica.2009.10.026
    [2]
    S. Cai, Z. Ma, M. J. Skibniewski, and S. Bao, “Construction automation and robotics for high-rise buildings over the past decades: A comprehensive review,” Advanced Engineering Informatics, vol. 42, Article No. 100989, 2019. doi: 10.1016/j.aei.2019.100989
    [3]
    D. Han, P. Huang, X. Liu, and Y. Yang, “Combined spacecraft stabilization control after multiple impacts during the capture of a tumbling target by a space robot,” Acta Astronautica, vol. 176, pp. 24–32, 2020. doi: 10.1016/j.actaastro.2020.05.035
    [4]
    S. E. Fasoli, H. I. Krebs, J. Stein, W. R. Frontera, and N. Hogan, “Effects of robotic therapy on motor impairment and recovery in chronic stroke,” Archives of Physical Medicine and Rehabilitation, vol. 84, no. 4, pp. 477–482, 2003. doi: 10.1053/apmr.2003.50110
    [5]
    J. C. Perry, J. Rosen, and S. Burns, “Upper-limb powered exoskeleton design,” IEEE/ASME Transactions on Mechatronics, vol. 12, no. 4, pp. 408–417, 2007. doi: 10.1109/TMECH.2007.901934
    [6]
    M. Bergamasco, B. Allotta, L. Bosio, L. Ferretti, G. Parrini, G. Prisco, F. Salsedo, and G. Sartini, “An arm exoskeleton system for teleoperation and virtual environments applications,” in Proc. IEEE Int. Conf. Robotics and Automation, 1994, pp. 1449–1454.
    [7]
    H. Modares, I. Ranatunga, F. L. Lewis, and D. O. Popa, “Optimized assistive human–robot interaction using reinforcement learning,” IEEE Trans. Cybernetics, vol. 46, no. 3, pp. 655–667, 2015.
    [8]
    K. Guo, Y. Pan, D. Zheng, and H. Yu, “Composite learning control of robotic systems: A least squares modulated approach,” Automatica, vol. 111, Article No. 108612, 2020. doi: 10.1016/j.automatica.2019.108612
    [9]
    T. Sun and Y. Pan, “Robust adaptive control for prescribed performance tracking of constrained uncertain nonlinear systems,” J. Franklin Institute, vol. 356, no. 1, pp. 18–30, 2019. doi: 10.1016/j.jfranklin.2018.09.005
    [10]
    K. Dupree, P. M. Patre, Z. D. Wilcox, and W. E. Dixon, “Asymptotic optimal control of uncertain nonlinear euler-lagrange systems,” Automatica, vol. 47, no. 1, pp. 99–107, 2011. doi: 10.1016/j.automatica.2010.10.007
    [11]
    Z. Li, J. Liu, Z. Huang, Y. Peng, H. Pu, and L. Ding, “Adaptive impedance control of human–robot cooperation using reinforcement learning,” IEEE Trans. Industrial Electronics, vol. 64, no. 10, pp. 8013–8022, 2017. doi: 10.1109/TIE.2017.2694391
    [12]
    T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Stability-guaranteed variable impedance control of robots based on approximate dynamic inversion,” IEEE Trans. Systems,Man,and Cybernetics:Systems, vol. 51, no. 7, pp. 4193–4200, 2019. doi: 10.1109/TSMC.2019.2930582
    [13]
    T. Sun, L. Cheng, L. Peng, Z. Hou, and Y. Pan, “Learning impedance control of robots with enhanced transient and steady-state control performances,” Science China Information Sciences, vol. 63, no. 9, pp. 1–13, 2020.
    [14]
    T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Composite learning enhanced robot impedance control,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 3, pp. 1052–1059, 2020. doi: 10.1109/TNNLS.2019.2912212
    [15]
    R. Colbaugh, H. Seraji, and K. Glass, “Direct adaptive impedance control of robot manipulators,” J. Robotic Systems, vol. 10, no. 2, pp. 217–248, 1993. doi: 10.1002/rob.4620100205
    [16]
    S. Ge, C. Hang, L. Woon, and X. Chen, “Impedance control of robot manipulators using adaptive neural networks,” Int. J. Intelligent Control and Systems, vol. 2, no. 3, pp. 433–452, 1998.
    [17]
    C. Wang, Y. Li, S. S. Ge, and T. H. Lee, “Reference adaptation for robots in physical interactions with unknown environments,” IEEE Transactions on Cybernetics, vol. 47, no. 11, pp. 3504–3515, 2016.
    [18]
    W.-S. Lu and Q.-H. Meng, “Impedance control with adaptation for robotic manipulations,” IEEE Trans. Robotics and Automation, vol. 7, no. 3, pp. 408–415, 1991. doi: 10.1109/70.88152
    [19]
    H. N. Rahimi, I. Howard, and L. Cui, “Neural impedance adaption for assistive human–robot interaction,” Neurocomputing, vol. 290, pp. 50–59, 2018. doi: 10.1016/j.neucom.2018.02.025
    [20]
    Y. Wang, W. Sun, Y. Xiang, and S. Miao, “Neural network-based robust tracking control for robots,” Intelligent Automation &Soft Computing, vol. 15, no. 2, pp. 211–222, 2009.
    [21]
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. USA: A Bradford Book, 2018.
    [22]
    Y. Yang, D. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, 2017. doi: 10.1109/TNNLS.2017.2654324
    [23]
    Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics: Systems, to be published, 2020.
    [24]
    Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5441–5455, 2020. doi: 10.1109/TNNLS.2020.2967871
    [25]
    D. Wang and X. Zhong, “Advanced policy learning near-optimal regulation,” IEEE/CAA J. Automa Sinica, vol. 6, no. 3, pp. 743–749, 2019. doi: 10.1109/JAS.2019.1911489
    [26]
    Y. Yang, Z. Guo, H. Xiong, D. Ding, Y. Yin, and D. C. Wunsch, “Datadriven robust control of discrete-time uncertain linear systems via offpolicy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3735–3747, 2019. doi: 10.1109/TNNLS.2019.2897814
    [27]
    D. Wang, D. Liu, C. Mu, and Y. Zhang, “Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1342–1351, 2018. doi: 10.1109/TNNLS.2017.2749641
    [28]
    Q. Zhang and D. Zhao, “Data-based reinforcement learning for nonzerosum games with unknown drift dynamics,” IEEE Trans. Cybernetics, vol. 49, no. 8, pp. 2874–2885, 2019. doi: 10.1109/TCYB.2018.2830820
    [29]
    H. Modares, F. L. Lewis, and Z.-P. Jiang, “H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749
    [30]
    B. Luo, H.-N. Wu, and T. Huang, “Off-policy reinforcement learning for H8 control design,” IEEE trans. Cybernetics, vol. 45, no. 1, pp. 65–76, 2014.
    [31]
    W. Gao, Z. Jiang, and K. Ozbay, “Data-driven adaptive optimal control of connected vehicles,” IEEE Trans. Intelligent Transportation Systems, vol. 18, no. 5, pp. 1122–1133, 2017. doi: 10.1109/TITS.2016.2597279
    [32]
    W. Gao, J. Gao, K. Ozbay, and Z. Jiang, “Reinforcement-learningbased cooperative adaptive cruise control of buses in the lincoln tunnel corridor with time-varying topology,” IEEE Trans. Intelligent Transportation Systems, vol. 20, no. 10, pp. 3796–3805, 2019. doi: 10.1109/TITS.2019.2895285
    [33]
    T. Degris, M. White, and R. S. Sutton, “Off-policy actor-critic,” arXiv preprint arXiv:1205.4839, 2012.
    [34]
    Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. doi: 10.1016/j.automatica.2012.06.096
    [35]
    J. Kober and J. R. Peters, “Policy search for motor primitives in robotics,” in Advances in Neural Information Processing Systems, in Learning Motor Skills, Cham: Springer, 2014, pp. 83–117.
    [36]
    F. Zhang, D. M. Dawson, M. S. de Queiroz, and W. E. Dixon, “Global adaptive output feedback tracking control of robot manipulators,” IEEE Trans. Automatic Control, vol. 45, no. 6, pp. 1203–1208, 2000. doi: 10.1109/9.863607
    [37]
    F. L. Lewis, D. M. Dawson, and C. T. Abdallah, Robot Manipulator Control: Theory and Practice. Boca Raton, Florida: CRC Press, 2003.
    [38]
    J. E. Slotine and W. Li, “On the adaptive control of robot manipulators,” Int. J. Robot. Res., vol. 6, no. 3, pp. 49–59, 1987. doi: 10.1177/027836498700600303
    [39]
    A. T. Hasan, N. Ismail, A. Hamouda, I. Aris, M. Marhaban, and H. AlAssadi, “Artificial neural network-based kinematics jacobian solution for serial manipulator passing through singular configurations,” Advances in Engineering Software, vol. 41, no. 2, pp. 359–367, 2010. doi: 10.1016/j.advengsoft.2009.06.006
    [40]
    R. C. Miall, D. J. Weir, D. M. Wolpert, and J. F. Stein, “Is the cerebellum a smith predictor?” Journal of Motor Behavior, vol. 25, no. 3, pp. 203–216, 1993. doi: 10.1080/00222895.1993.9942050
    [41]
    A. Phatak, H. Weinert, I. Segall, and C. N. Day, “Identification of a modified optimal control model for the human operator,” Automatica, vol. 12, no. 1, pp. 31–41, 1976. doi: 10.1016/0005-1098(76)90066-2
    [42]
    J. Ragazzini, “Engineering aspects of the human being as a servomechanism,” Am. Psychol., vol. 3, pp. 219–314, 1948. doi: 10.1037/h0056536
    [43]
    E. Zergeroglu, W. Dixon, D. Haste, and D. Dawson, “A composite adaptive output feedback tracking controller for robotic manipulators,” Robotica, vol. 17, no. 6, pp. 591–600, 1999. doi: 10.1017/S0263574799001848
    [44]
    H. Y. Lau and L. C. Wai, “A jacobian-based redundant control strategy for the 7-DOF wam,” in Proc. 7th Int. Conf. Control, Automation, Robotics and Vision, (ICARCV 2002), IEEE, 2002, vol. 2, pp. 1060–1065.
    [45]
    F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. Hoboken, NewJersey: John Wiley & Sons, 2012.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)

    Article Metrics

    Article views (7084) PDF downloads(146) Cited by()

    Highlights

    • We presented a novel two-level HRI controller design framework consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design
    • In the outer-loop, the data-driven reinforcement learning technique is used for performance optimization to assign the optimal impedance parameters
    • In the inner-loop, a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return