Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

Ao Xi; Thushal Wijekoon Mudiyanselage; Dacheng Tao; Chao Chen

doi:10.1109/JAS.2019.1911567

Volume 6 Issue 4

Jul. 2019

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2019 > 6(4): 938-951

Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938-951, July 2019. doi: 10.1109/JAS.2019.1911567

Citation:

Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938-951, July 2019. doi: 10.1109/JAS.2019.1911567

Citation:

PDF( 1668 KB)

Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

doi: 10.1109/JAS.2019.1911567

More Information

Author Bio:
Ao Xi received the B.Eng. degree in automation, from Northwestern Poly-technical University in Xi’an, and the Postgraduate Diploma of advanced control and systems engineering at the University of Manchester in Manchester, in 2014 and 2015, respectively. He is currently pursing the Ph.D. degree at Monash University, Melbourne. Since 2017, he has been a Research Assistant in the Laboratory of Motion Generation and Analysis at Monash University. His research interests include humanoid biped robot, reinforcement learning, deep reinforcement learning, biped robot gait generation, flight control systems, and control theories. Mr. Xi received awards including the 2012 Technology Star First Class Scholarship, the 2013 Endress & Hauser First Class Scholarship, and the 2012 and 2013 Merit Student First Class Scholarship at Northwestern Poly-technical University

Thushal Wijekoon Mudiyanselage is currently pursing B.Eng. degree of Mechatronics at Monash University in Melbourne. Since 2018, he has been a Mechanical Design Engineer at Dara Switchboards in Melbourne. His research interests include application of machine learning techniques in robotics and bipedal robots

Dacheng Tao (F’15) is a Professor of computer science and ARC Laureate Fellow in the School of Computer Science and the Faculty of Engineering, and the Inaugural Director of the UBTECH Sydney Artificial Intelligence Centre, at the University of Sydney. He mainly applies statistics and mathematics to Artificial Intelligence and Data Science. His research results have expounded in one monograph and 200+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-IP, T-SP, T-NNLS, T-CYB, IJCV, JMLR, NIPS, ICML, CVPR, ICCV, ECCV, ICDM, and KDD, with several best paper awards, such as the IEEE ICDM’07 best theory/algorithm paper runner up award, the IEEE ICDM’13 best student paper award, the 2014 ICDM 10-year highest-impact paper award, the 2017 IEEE Signal Processing Society Best Paper Award, and the 2018 IJCAI distinguished paper award. He received the 2015 Australian Scopus-Eureka Prize and the 2018 IEEE ICDM Research Contributions Award. He is a Fellow of the Australian Academy of Science, AAAS, IEEE, IAPR, OSA and SPIE

Chao Chen received the B.Eng. degree of mechanical engineering at Shanghai Jiao Tong University in Shanghai, and the M.Eng. and Ph.D. degrees in mechanical engineering at McGill University in Montreal, in 1996, 2002, and 2006, respectively. From 2006 to 2007, he was a Postdoctoral Fellow at the University of Toronto. From 2007 to 2010, he was a Lecturer in the Department of Mechanical and Aerospace Engineering, Monash University in Melbourne. Since 2011, he has been a Senior Lecturer at Monash University. Dr. Chen is also the Adjunct Associate Professor at the Chinese University of Hong Kong, and was a Visiting Professor in IRCCyN, Ecole Central de Nantes. His research interests include robotic design and control, theory of mechanisms, robotic exoskeleton, robotic surgery, agricultural robots, and humanoid robots. Dr. Chen received awards including the 1996 Dean’s Honor List at Shanghai Jiao Tong University, the 2004 ASME International Scholarship, the 2005 FQRNT Doctorate Fellowship, the 2006-2007 FQRNT Post-Doctorate Fellowship, and the 2017 Year’s Innovation Award by Australia Hand Therapy Association
Corresponding author: A. Xi and C. Chen are with the Department of Mechanical and Aerospace Engineering, Monash University, Clayton, VIC 3800, Australia (e-mail: ao.xi@monash.edu)
Received Date: 2018-12-04
Revised Date: 2019-04-08
Accepted Date: 2019-04-27

Available Online: 2019-06-22

Abstract

Abstract

In this work, we combined the model based reinforcement learning (MBRL) and model free reinforcement learning (MFRL) to stabilize a biped robot (NAO robot) on a rotating platform, where the angular velocity of the platform is unknown for the proposed learning algorithm and treated as the external disturbance. Nonparametric Gaussian processes normally require a large number of training data points to deal with the discontinuity of the estimated model. Although some improved method such as probabilistic inference for learning control (PILCO) does not require an explicit global model as the actions are obtained by directly searching the policy space, the overfitting and lack of model complexity may still result in a large deviation between the prediction and the real system. Besides, none of these approaches consider the data error and measurement noise during the training process and test process, respectively. We propose a hierarchical Gaussian processes (GP) models, containing two layers of independent GPs, where the physically continuous probability transition model of the robot is obtained. Due to the physically continuous estimation, the algorithm overcomes the overfitting problem with a guaranteed model complexity, and the number of training data is also reduced. The policy for any given initial state is generated automatically by minimizing the expected cost according to the predefined cost function and the obtained probability distribution of the state. Furthermore, a novel Q(λ) based MFRL method scheme is employed to improve the policy. Simulation results show that the proposed RL algorithm is able to balance NAO robot on a rotating platform, and it is capable of adapting to the platform with varying angular velocity.
- Biped robot,
- Gaussian processes (GP),
- reinforcement learning,
- temporal difference

FullText(HTML)

References(39)

References

[1]	E. J. Molinos, A. Llamazares, N. Hernández, R. Arroyo, A. Cela, and J. J. Yebes, " Perception and navigation in unknown environments: the DARPA robotics challenge,” in Proc. 1st Iberian Robotics Conf., pp. 321–329, 2013.
[2]	M. Fujita, Y. Kuroki, T. Ishida, and T. T. Doi, " A small humanoid robot sdr-4x for entertainment applications,” in Proc. Int. Conf. Advanced Intelligent Mechatronics, pp. 938–943, 2003.
[3]	C. Chevallereau, G. Bessonnet, G. Abba, and Y. Aoustin, Bipedal Robots: Modeling, Design and Walking Synthesis, Wiley-ISTE, 1st Edition, 2008.
[4]	M. Vukobratovic and B. Borovac, " Zero-moment point thirty-five years of its life,” Internat. J. Human Robotics, 2004.
[5]	B. J. Lee, D. Stonier, Y. D. Kim, J. K. Yoo, and J. H. Kim, " Modifiable walking pattern of a humanoid robot by using allowable ZMP Variation,” IEEE Trans. Robotics, vol. 24, no. 4, pp. 917–925, 2008. doi: 10.1109/TRO.2008.926859
[6]	K. Nishiwaki and S. Kagami, " Strategies for adjusting the ZMP reference trajectory for maintaining balance in humanoid walking,” in Proc. IEEE Int. Conf. Robotics and Automation, 2010.
[7]	K. Hirai, M. Hirose, Y. Haikawa, and T. Takenaka, " The development of honda humanoid robot,” in Proc. IEEE Int. Conf. Robotics and Automation, vol. 2, pp. 1321–1326, May 1998.
[8]	P. M. Wensing and G. B. Hammam, " Optimizing foot centers of pressure through force distribution in a humanoid robot,” Inter. J. Humanoid Robotics, vol. 10, no. 3, 2013.
[9]	H. Zhao, A. Hereid, W. Ma, and A. D. Ames, " Multi-contact bipedal robotic locomotion,” Robotica, vol. 35, no. 5, pp. 1072–1106, 2017. doi: 10.1017/S0263574715000995
[10]	U. Huzaifa, C. Maguire, and A. LaViers, " Toward an expressive bipedal robot: variable gait synthesis and validation in a planar model,” arXiv: 1808.05594v2, 2018.
[11]	T. Takenaka, T. Matsumoto, T. Yoshiike, T. Hasegawa, S. Shirokura, H. Kaneko, and A. Orita, " Real time motion generation and control for biped robot-4th report: integrated balance control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, pp. 1601–1608, 2009.
[12]	J. J. Alcaraz-Jiménez, D. Herrero-Pérez, and H. Martínez-Barberá, " Robust feedback control of ZMP-based gait for the humanoid robot Nao,” Int. J. Robot. Res, vol. 32, no. 9–10, pp. 1074–1088, 2013. doi: 10.1177/0278364913487566
[13]	K. Seo, J. Kim, and K. Roh, " Towards natural bipedal walking: virtual gravity compensation and capture point control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), pp. 4019–4026, 2012.
[14]	J. Kim, T.T. Tran, C.V. Dang, and B. Kang, " Motion and walking stabilization of humanoids using sensory reflex control,” Inter. J. Advanced Robotic Systems (IJARS), pp. 77–86, 2016.
[15]	M. Shahbazi, G.A.D. Lopes, and R. Babuska, " Observer-based postural balance control for humanoid robots,” in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), pp. 891–896, 2013.
[16]	S. Wang, W. Chavalitwongse, and B. Robert, " Machine learning algorithms in bipedal robot control,” IEEE Trans. System,Man,and Cybernetics:Part C Application and Reviews, vol. 42, no. 5, pp. 728–743, 2012. doi: 10.1109/TSMCC.2012.2186565
[17]	I. Mordatch, N. Mishra, C. Eppenr, and P. Abbeel, " Combining model-based policy search with online model learning for control of physical humanoids,” in Proc. Int. Conf. Robotics and Automation, pp. 242–248, 2016.
[18]	P. Hénaff, V. Scesa, F.B. Ouezdou, and O. Bruneau, " Real time implementation of CTRNN and BPTT algorithm to learn on-line biped robot balance: Experiments on the standing posture,” Control Eng. Pract., vol. 19, no. 1, pp. 89–99, 2011. doi: 10.1016/j.conengprac.2010.10.002
[19]	A. A. Saputra and I. A. Sulistijono, " Biologically inspired control system for 3D locomotion of a humanoid biped robot,” IEEE Trans. System,Man,and Cybernetics:Systems, vol. 46, no. 7, pp. 898–911, 2016. doi: 10.1109/TSMC.2015.2497250
[20]	J. P. Ferreira, M. Crisostomo, and A.P. Coimbra, " Rejection of an external force in the sagittal plane applied on a biped robot using a neuro-fuzzy controller,” in Proc. Int. Conf. Advanced Robotics, pp. 1–6. IEEE, 2009.
[21]	F. Farzadpour and M. Danesh, " A new hybrid intelligent control algorithm for a seven-link biped walking robot,” J. Vibration and Control, vol. 20, no. 9, pp. 1378–1393, 2014. doi: 10.1177/1077546312470476
[22]	K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Gait balance and acceleration of a biped robot based on Q-learning,” IEEE Access, vol. 4, pp. 2439–2449, Jun. 2016. doi: 10.1109/ACCESS.2016.2570255
[23]	K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Motion segmentation and balancing for a biped robot’s imitation learning,” IEEE Trans. Industrial Information, vol. 13, no. 3, Jun. 2017.
[24]	W. Wu and L. Gao, " Posture self-stabilizer of a bipedal robot based on training platform and reinforcement learning,” J. Robotics and Autonomous Systems, vol. 98, pp. 42–55, 2017. doi: 10.1016/j.robot.2017.09.001
[25]	A. S. Polydoros and L. Nalpantidis, " Survey of model-based rinforcement learning: application on robotics,” J. intelligent Robotic Systems, vol. 86, pp. 153–173, 2017. doi: 10.1007/s10846-017-0468-y
[26]	M. P. Deisenroth and C. E. Rasmussen, " PILCO: a model-based and data-efficient approach to policy search,” in Proc. 28th Int. Conf. Machine Learning, pp. 465–472, 2011.
[27]	M. Cutler and J. How, " Efficient reinforcement learning for robots using informative simulated priors,” in Proc. Int. Conf. Robotics and Automation, pp. 2605–2612, 2015.
[28]	P. Englert, A. paraschos, J. Peters, and M. P. Deisenroth, " Model-based imitation learning by probabilistic trajectory matching,” in Proc. Int. Conf. Robotics and Automation, pp. 1922–1927, 2013.
[29]	Y. Gal, R. T. McAllister, and C. E. Rasmussen, " Improving PILCO with Bayesian Neural Network Dynamics Models,” Cambridge University, 2017.
[30]	B. Stephens, " Integral control of humanoid balance,” in Proc. Int. Conf. Int. Robotics and Systems, pp. 4020–4027, 2007.
[31]	J. Kober, J. A. Bagnell, and J. Peters, " Reinforcement learning in robotics: a survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013. doi: 10.1177/0278364913495721
[32]	C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, Cambridge, MA, USA, 2006.
[33]	A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, zond Edition (Chapman & Hall/CRC Texts in Statistical Science), 3rd Edition, Nov. 2013.
[34]	D. J. C. Mackay, " Comparison of approximate methods for handling hyper-parameters,” Neural Computation, vol. 11, no. 5, pp. 1035–1068, 1999. doi: 10.1162/089976699300016331
[35]	M. Deisenroth, D. Fox, and C. E. Rasmussen, " Gaussian processes for data-efficient learning in robotics and control,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–423, 2015. doi: 10.1109/TPAMI.2013.218
[36]	M. Deisenroth, G. Neumann, and J. Peters, " A Survey of policy search for robotics,” Foundations and Trends in Robotics, vol. 2, no. 1–2, pp. 1–142, 2013.
[37]	C. J. C. H. Watkins, and P. Dayan, " Technical note Q-learning,” Machine Learning, vol. 8, pp. 279–292, 1992.
[38]	R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, The MIT Press, USA, 2018.
[39]	J. Feng, C. Fyfe, and L. C. Jain, " Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies,” J. Intelligence and Fuzzy System, vol. 20, no. 1–2, pp. 73–82, 2009.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13)

Get Citation

PDF

XML

Article Metrics

Article views (2582) PDF downloads(119)

Highlights

Correntropy is introduced to our algorithm to suppress the outliers and noises.
The one-to-one index mapping is employed for fast speed.
The closed-form solution of our algorithm is presented to reduce the run-time.
Our algorithm is independent of feature extraction and can be used with other methods.

Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

doi: 10.1109/JAS.2019.1911567

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content