Volume 6
							Issue 4 
						IEEE/CAA Journal of Automatica Sinica
| Citation: | Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938-951, July 2019. doi: 10.1109/JAS.2019.1911567 | 
	                | [1] | 
					 E. J. Molinos, A. Llamazares, N. Hernández, R. Arroyo, A. Cela, and J. J. Yebes, " Perception and navigation in unknown environments: the DARPA robotics challenge,” in Proc. 1st Iberian Robotics Conf., pp. 321–329, 2013. 
						
					 | 
			
| [2] | 
					 M. Fujita, Y. Kuroki, T. Ishida, and T. T. Doi, " A small humanoid robot sdr-4x for entertainment applications,” in Proc. Int. Conf. Advanced Intelligent Mechatronics, pp. 938–943, 2003. 
						
					 | 
			
| [3] | 
					 C. Chevallereau, G. Bessonnet, G. Abba, and Y. Aoustin, Bipedal Robots: Modeling, Design and Walking Synthesis, Wiley-ISTE, 1st Edition, 2008. 
						
					 | 
			
| [4] | 
					 M. Vukobratovic and B. Borovac, " Zero-moment point thirty-five years of its life,” Internat. J. Human Robotics, 2004. 
						
					 | 
			
| [5] | 
					 B. J. Lee, D. Stonier, Y. D. Kim, J. K. Yoo, and J. H. Kim, " Modifiable walking pattern of a humanoid robot by using allowable ZMP Variation,” IEEE Trans. Robotics, vol. 24, no. 4, pp. 917–925, 2008. doi:  10.1109/TRO.2008.926859 
						
					 | 
			
| [6] | 
					 K. Nishiwaki and S. Kagami, " Strategies for adjusting the ZMP reference trajectory for maintaining balance in humanoid walking,” in Proc. IEEE Int. Conf. Robotics and Automation, 2010. 
						
					 | 
			
| [7] | 
					 K. Hirai, M. Hirose, Y. Haikawa, and T. Takenaka, " The development of honda humanoid robot,” in Proc. IEEE Int. Conf. Robotics and Automation, vol. 2, pp. 1321–1326, May 1998. 
						
					 | 
			
| [8] | 
					 P. M. Wensing and G. B. Hammam, " Optimizing foot centers of pressure through force distribution in a humanoid robot,” Inter. J. Humanoid Robotics, vol. 10, no. 3, 2013. 
						
					 | 
			
| [9] | 
					 H. Zhao, A. Hereid, W. Ma, and A. D. Ames, " Multi-contact bipedal robotic locomotion,” Robotica, vol. 35, no. 5, pp. 1072–1106, 2017. doi:  10.1017/S0263574715000995 
						
					 | 
			
| [10] | 
					 U. Huzaifa, C. Maguire, and A. LaViers, " Toward an expressive bipedal robot: variable gait synthesis and validation in a planar model,” arXiv: 1808.05594v2, 2018. 
						
					 | 
			
| [11] | 
					 T. Takenaka, T. Matsumoto, T. Yoshiike, T. Hasegawa, S. Shirokura, H. Kaneko, and A. Orita, " Real time motion generation and control for biped robot-4th report: integrated balance control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, pp. 1601–1608, 2009. 
						
					 | 
			
| [12] | 
					 J. J. Alcaraz-Jiménez, D. Herrero-Pérez, and H. Martínez-Barberá, " Robust feedback control of ZMP-based gait for the humanoid robot Nao,” Int. J. Robot. Res, vol. 32, no. 9–10, pp. 1074–1088, 2013. doi:  10.1177/0278364913487566 
						
					 | 
			
| [13] | 
					 K. Seo, J. Kim, and K. Roh, " Towards natural bipedal walking: virtual gravity compensation and capture point control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), pp. 4019–4026, 2012. 
						
					 | 
			
| [14] | 
					 J. Kim, T.T. Tran, C.V. Dang, and B. Kang, " Motion and walking stabilization of humanoids using sensory reflex control,” Inter. J. Advanced Robotic Systems (IJARS) 
						
					 | 
			
| [15] | 
					 M. Shahbazi, G.A.D. Lopes, and R. Babuska, " Observer-based postural balance control for humanoid robots,” in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), pp. 891–896, 2013. 
						
					 | 
			
| [16] | 
					 S. Wang, W. Chavalitwongse, and B. Robert, " Machine learning algorithms in bipedal robot control,” IEEE Trans. System,Man,and Cybernetics:Part C Application and Reviews, vol. 42, no. 5, pp. 728–743, 2012. doi:  10.1109/TSMCC.2012.2186565 
						
					 | 
			
| [17] | 
					 I. Mordatch, N. Mishra, C. Eppenr, and P. Abbeel, " Combining model-based policy search with online model learning for control of physical humanoids,” in Proc. Int. Conf. Robotics and Automation, pp. 242–248, 2016. 
						
					 | 
			
| [18] | 
					 P. Hénaff, V. Scesa, F.B. Ouezdou, and O. Bruneau, " Real time implementation of CTRNN and BPTT algorithm to learn on-line biped robot balance: Experiments on the standing posture,” Control Eng. Pract., vol. 19, no. 1, pp. 89–99, 2011. doi:  10.1016/j.conengprac.2010.10.002 
						
					 | 
			
| [19] | 
					 A. A. Saputra and I. A. Sulistijono, " Biologically inspired control system for 3D locomotion of a humanoid biped robot,” IEEE Trans. System,Man,and Cybernetics:Systems, vol. 46, no. 7, pp. 898–911, 2016. doi:  10.1109/TSMC.2015.2497250 
						
					 | 
			
| [20] | 
					 J. P. Ferreira, M. Crisostomo, and A.P. Coimbra, " Rejection of an external force in the sagittal plane applied on a biped robot using a neuro-fuzzy controller,” in Proc. Int. Conf. Advanced Robotics, pp. 1–6. IEEE, 2009. 
						
					 | 
			
| [21] | 
					 F. Farzadpour and M. Danesh, " A new hybrid intelligent control algorithm for a seven-link biped walking robot,” J. Vibration and Control, vol. 20, no. 9, pp. 1378–1393, 2014. doi:  10.1177/1077546312470476 
						
					 | 
			
| [22] | 
					 K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Gait balance and acceleration of a biped robot based on Q-learning,” IEEE Access, vol. 4, pp. 2439–2449, Jun. 2016. doi:  10.1109/ACCESS.2016.2570255 
						
					 | 
			
| [23] | 
					 K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Motion segmentation and balancing for a biped robot’s imitation learning,” IEEE Trans. Industrial Information, vol. 13, no. 3, Jun. 2017. 
						
					 | 
			
| [24] | 
					 W. Wu and L. Gao, " Posture self-stabilizer of a bipedal robot based on training platform and reinforcement learning,” J. Robotics and Autonomous Systems, vol. 98, pp. 42–55, 2017. doi:  10.1016/j.robot.2017.09.001 
						
					 | 
			
| [25] | 
					 A. S. Polydoros and L. Nalpantidis, " Survey of model-based rinforcement learning: application on robotics,” J. intelligent Robotic Systems, vol. 86, pp. 153–173, 2017. doi:  10.1007/s10846-017-0468-y 
						
					 | 
			
| [26] | 
					 M. P. Deisenroth and C. E. Rasmussen, " PILCO: a model-based and data-efficient approach to policy search,” in Proc. 28th Int. Conf. Machine Learning, pp. 465–472, 2011. 
						
					 | 
			
| [27] | 
					 M. Cutler and J. How, " Efficient reinforcement learning for robots using informative simulated priors,” in Proc. Int. Conf. Robotics and Automation, pp. 2605–2612, 2015. 
						
					 | 
			
| [28] | 
					 P. Englert, A. paraschos, J. Peters, and M. P. Deisenroth, " Model-based imitation learning by probabilistic trajectory matching,” in Proc. Int. Conf. Robotics and Automation, pp. 1922–1927, 2013. 
						
					 | 
			
| [29] | 
					 Y. Gal, R. T. McAllister, and C. E. Rasmussen, " Improving PILCO with Bayesian Neural Network Dynamics Models,” Cambridge University, 2017. 
						
					 | 
			
| [30] | 
					 B. Stephens, " Integral control of humanoid balance,” in Proc. Int. Conf. Int. Robotics and Systems, pp. 4020–4027, 2007. 
						
					 | 
			
| [31] | 
					 J. Kober, J. A. Bagnell, and J. Peters, " Reinforcement learning in robotics: a survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013. doi:  10.1177/0278364913495721 
						
					 | 
			
| [32] | 
					 C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, Cambridge, MA, USA, 2006. 
						
					 | 
			
| [33] | 
					 A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, zond Edition (Chapman & Hall/CRC Texts in Statistical Science), 3rd Edition, Nov. 2013. 
						
					 | 
			
| [34] | 
					 D. J. C. Mackay, " Comparison of approximate methods for handling hyper-parameters,” Neural Computation, vol. 11, no. 5, pp. 1035–1068, 1999. doi:  10.1162/089976699300016331 
						
					 | 
			
| [35] | 
					 M. Deisenroth, D. Fox, and C. E. Rasmussen, " Gaussian processes for data-efficient learning in robotics and control,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–423, 2015. doi:  10.1109/TPAMI.2013.218 
						
					 | 
			
| [36] | 
					 M. Deisenroth, G. Neumann, and J. Peters, " A Survey of policy search for robotics,” Foundations and Trends in Robotics, vol. 2, no. 1–2, pp. 1–142, 2013. 
						
					 | 
			
| [37] | 
					 C. J. C. H. Watkins, and P. Dayan, " Technical note Q-learning,” Machine Learning, vol. 8, pp. 279–292, 1992. 
						
					 | 
			
| [38] | 
					 R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, The MIT Press, USA, 2018. 
						
					 | 
			
| [39] | 
					 J. Feng, C. Fyfe, and L. C. Jain, " Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies,” J. Intelligence and Fuzzy System, vol. 20, no. 1–2, pp. 73–82, 2009. 
						
					 |