A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 6 Issue 4
Jul.  2019

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938-951, July 2019. doi: 10.1109/JAS.2019.1911567
Citation: Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938-951, July 2019. doi: 10.1109/JAS.2019.1911567

Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

doi: 10.1109/JAS.2019.1911567
More Information
  • In this work, we combined the model based reinforcement learning (MBRL) and model free reinforcement learning (MFRL) to stabilize a biped robot (NAO robot) on a rotating platform, where the angular velocity of the platform is unknown for the proposed learning algorithm and treated as the external disturbance. Nonparametric Gaussian processes normally require a large number of training data points to deal with the discontinuity of the estimated model. Although some improved method such as probabilistic inference for learning control (PILCO) does not require an explicit global model as the actions are obtained by directly searching the policy space, the overfitting and lack of model complexity may still result in a large deviation between the prediction and the real system. Besides, none of these approaches consider the data error and measurement noise during the training process and test process, respectively. We propose a hierarchical Gaussian processes (GP) models, containing two layers of independent GPs, where the physically continuous probability transition model of the robot is obtained. Due to the physically continuous estimation, the algorithm overcomes the overfitting problem with a guaranteed model complexity, and the number of training data is also reduced. The policy for any given initial state is generated automatically by minimizing the expected cost according to the predefined cost function and the obtained probability distribution of the state. Furthermore, a novel Q(λ) based MFRL method scheme is employed to improve the policy. Simulation results show that the proposed RL algorithm is able to balance NAO robot on a rotating platform, and it is capable of adapting to the platform with varying angular velocity.


  • loading
  • [1]
    E. J. Molinos, A. Llamazares, N. Hernández, R. Arroyo, A. Cela, and J. J. Yebes, " Perception and navigation in unknown environments: the DARPA robotics challenge,” in Proc. 1st Iberian Robotics Conf., pp. 321–329, 2013.
    M. Fujita, Y. Kuroki, T. Ishida, and T. T. Doi, " A small humanoid robot sdr-4x for entertainment applications,” in Proc. Int. Conf. Advanced Intelligent Mechatronics, pp. 938–943, 2003.
    C. Chevallereau, G. Bessonnet, G. Abba, and Y. Aoustin, Bipedal Robots: Modeling, Design and Walking Synthesis, Wiley-ISTE, 1st Edition, 2008.
    M. Vukobratovic and B. Borovac, " Zero-moment point thirty-five years of its life,” Internat. J. Human Robotics, 2004.
    B. J. Lee, D. Stonier, Y. D. Kim, J. K. Yoo, and J. H. Kim, " Modifiable walking pattern of a humanoid robot by using allowable ZMP Variation,” IEEE Trans. Robotics, vol. 24, no. 4, pp. 917–925, 2008. doi: 10.1109/TRO.2008.926859
    K. Nishiwaki and S. Kagami, " Strategies for adjusting the ZMP reference trajectory for maintaining balance in humanoid walking,” in Proc. IEEE Int. Conf. Robotics and Automation, 2010.
    K. Hirai, M. Hirose, Y. Haikawa, and T. Takenaka, " The development of honda humanoid robot,” in Proc. IEEE Int. Conf. Robotics and Automation, vol. 2, pp. 1321–1326, May 1998.
    P. M. Wensing and G. B. Hammam, " Optimizing foot centers of pressure through force distribution in a humanoid robot,” Inter. J. Humanoid Robotics, vol. 10, no. 3, 2013.
    H. Zhao, A. Hereid, W. Ma, and A. D. Ames, " Multi-contact bipedal robotic locomotion,” Robotica, vol. 35, no. 5, pp. 1072–1106, 2017. doi: 10.1017/S0263574715000995
    U. Huzaifa, C. Maguire, and A. LaViers, " Toward an expressive bipedal robot: variable gait synthesis and validation in a planar model,” arXiv: 1808.05594v2, 2018.
    T. Takenaka, T. Matsumoto, T. Yoshiike, T. Hasegawa, S. Shirokura, H. Kaneko, and A. Orita, " Real time motion generation and control for biped robot-4th report: integrated balance control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, pp. 1601–1608, 2009.
    J. J. Alcaraz-Jiménez, D. Herrero-Pérez, and H. Martínez-Barberá, " Robust feedback control of ZMP-based gait for the humanoid robot Nao,” Int. J. Robot. Res, vol. 32, no. 9–10, pp. 1074–1088, 2013. doi: 10.1177/0278364913487566
    K. Seo, J. Kim, and K. Roh, " Towards natural bipedal walking: virtual gravity compensation and capture point control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), pp. 4019–4026, 2012.
    J. Kim, T.T. Tran, C.V. Dang, and B. Kang, " Motion and walking stabilization of humanoids using sensory reflex control,” Inter. J. Advanced Robotic Systems (IJARS), pp. 77–86, 2016.
    M. Shahbazi, G.A.D. Lopes, and R. Babuska, " Observer-based postural balance control for humanoid robots,” in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), pp. 891–896, 2013.
    S. Wang, W. Chavalitwongse, and B. Robert, " Machine learning algorithms in bipedal robot control,” IEEE Trans. System,Man,and Cybernetics:Part C Application and Reviews, vol. 42, no. 5, pp. 728–743, 2012. doi: 10.1109/TSMCC.2012.2186565
    I. Mordatch, N. Mishra, C. Eppenr, and P. Abbeel, " Combining model-based policy search with online model learning for control of physical humanoids,” in Proc. Int. Conf. Robotics and Automation, pp. 242–248, 2016.
    P. Hénaff, V. Scesa, F.B. Ouezdou, and O. Bruneau, " Real time implementation of CTRNN and BPTT algorithm to learn on-line biped robot balance: Experiments on the standing posture,” Control Eng. Pract., vol. 19, no. 1, pp. 89–99, 2011. doi: 10.1016/j.conengprac.2010.10.002
    A. A. Saputra and I. A. Sulistijono, " Biologically inspired control system for 3D locomotion of a humanoid biped robot,” IEEE Trans. System,Man,and Cybernetics:Systems, vol. 46, no. 7, pp. 898–911, 2016. doi: 10.1109/TSMC.2015.2497250
    J. P. Ferreira, M. Crisostomo, and A.P. Coimbra, " Rejection of an external force in the sagittal plane applied on a biped robot using a neuro-fuzzy controller,” in Proc. Int. Conf. Advanced Robotics, pp. 1–6. IEEE, 2009.
    F. Farzadpour and M. Danesh, " A new hybrid intelligent control algorithm for a seven-link biped walking robot,” J. Vibration and Control, vol. 20, no. 9, pp. 1378–1393, 2014. doi: 10.1177/1077546312470476
    K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Gait balance and acceleration of a biped robot based on Q-learning,” IEEE Access, vol. 4, pp. 2439–2449, Jun. 2016. doi: 10.1109/ACCESS.2016.2570255
    K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Motion segmentation and balancing for a biped robot’s imitation learning,” IEEE Trans. Industrial Information, vol. 13, no. 3, Jun. 2017.
    W. Wu and L. Gao, " Posture self-stabilizer of a bipedal robot based on training platform and reinforcement learning,” J. Robotics and Autonomous Systems, vol. 98, pp. 42–55, 2017. doi: 10.1016/j.robot.2017.09.001
    A. S. Polydoros and L. Nalpantidis, " Survey of model-based rinforcement learning: application on robotics,” J. intelligent Robotic Systems, vol. 86, pp. 153–173, 2017. doi: 10.1007/s10846-017-0468-y
    M. P. Deisenroth and C. E. Rasmussen, " PILCO: a model-based and data-efficient approach to policy search,” in Proc. 28th Int. Conf. Machine Learning, pp. 465–472, 2011.
    M. Cutler and J. How, " Efficient reinforcement learning for robots using informative simulated priors,” in Proc. Int. Conf. Robotics and Automation, pp. 2605–2612, 2015.
    P. Englert, A. paraschos, J. Peters, and M. P. Deisenroth, " Model-based imitation learning by probabilistic trajectory matching,” in Proc. Int. Conf. Robotics and Automation, pp. 1922–1927, 2013.
    Y. Gal, R. T. McAllister, and C. E. Rasmussen, " Improving PILCO with Bayesian Neural Network Dynamics Models,” Cambridge University, 2017.
    B. Stephens, " Integral control of humanoid balance,” in Proc. Int. Conf. Int. Robotics and Systems, pp. 4020–4027, 2007.
    J. Kober, J. A. Bagnell, and J. Peters, " Reinforcement learning in robotics: a survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013. doi: 10.1177/0278364913495721
    C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, Cambridge, MA, USA, 2006.
    A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, zond Edition (Chapman & Hall/CRC Texts in Statistical Science), 3rd Edition, Nov. 2013.
    D. J. C. Mackay, " Comparison of approximate methods for handling hyper-parameters,” Neural Computation, vol. 11, no. 5, pp. 1035–1068, 1999. doi: 10.1162/089976699300016331
    M. Deisenroth, D. Fox, and C. E. Rasmussen, " Gaussian processes for data-efficient learning in robotics and control,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–423, 2015. doi: 10.1109/TPAMI.2013.218
    M. Deisenroth, G. Neumann, and J. Peters, " A Survey of policy search for robotics,” Foundations and Trends in Robotics, vol. 2, no. 1–2, pp. 1–142, 2013.
    C. J. C. H. Watkins, and P. Dayan, " Technical note Q-learning,” Machine Learning, vol. 8, pp. 279–292, 1992.
    R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, The MIT Press, USA, 2018.
    J. Feng, C. Fyfe, and L. C. Jain, " Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies,” J. Intelligence and Fuzzy System, vol. 20, no. 1–2, pp. 73–82, 2009.


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索


    Article Metrics

    Article views (2246) PDF downloads(108) Cited by()


    • Correntropy is introduced to our algorithm to suppress the outliers and noises.
    • The one-to-one index mapping is employed for fast speed.
    • The closed-form solution of our algorithm is presented to reduce the run-time.
    • Our algorithm is independent of feature extraction and can be used with other methods.


    DownLoad:  Full-Size Img  PowerPoint