IEEE/CAA Journal of Automatica Sinica
Citation:  Ao Xi, Thushal Wijekoon Mudiyanselage, Dacheng Tao and Chao Chen, "Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 938951, July 2019. doi: 10.1109/JAS.2019.1911567 
[1] 
E. J. Molinos, A. Llamazares, N. Hernández, R. Arroyo, A. Cela, and J. J. Yebes, " Perception and navigation in unknown environments: the DARPA robotics challenge,” in Proc. 1st Iberian Robotics Conf., pp. 321–329, 2013.

[2] 
M. Fujita, Y. Kuroki, T. Ishida, and T. T. Doi, " A small humanoid robot sdr4x for entertainment applications,” in Proc. Int. Conf. Advanced Intelligent Mechatronics, pp. 938–943, 2003.

[3] 
C. Chevallereau, G. Bessonnet, G. Abba, and Y. Aoustin, Bipedal Robots: Modeling, Design and Walking Synthesis, WileyISTE, 1st Edition, 2008.

[4] 
M. Vukobratovic and B. Borovac, " Zeromoment point thirtyfive years of its life,” Internat. J. Human Robotics, 2004.

[5] 
B. J. Lee, D. Stonier, Y. D. Kim, J. K. Yoo, and J. H. Kim, " Modifiable walking pattern of a humanoid robot by using allowable ZMP Variation,” IEEE Trans. Robotics, vol. 24, no. 4, pp. 917–925, 2008. doi: 10.1109/TRO.2008.926859

[6] 
K. Nishiwaki and S. Kagami, " Strategies for adjusting the ZMP reference trajectory for maintaining balance in humanoid walking,” in Proc. IEEE Int. Conf. Robotics and Automation, 2010.

[7] 
K. Hirai, M. Hirose, Y. Haikawa, and T. Takenaka, " The development of honda humanoid robot,” in Proc. IEEE Int. Conf. Robotics and Automation, vol. 2, pp. 1321–1326, May 1998.

[8] 
P. M. Wensing and G. B. Hammam, " Optimizing foot centers of pressure through force distribution in a humanoid robot,” Inter. J. Humanoid Robotics, vol. 10, no. 3, 2013.

[9] 
H. Zhao, A. Hereid, W. Ma, and A. D. Ames, " Multicontact bipedal robotic locomotion,” Robotica, vol. 35, no. 5, pp. 1072–1106, 2017. doi: 10.1017/S0263574715000995

[10] 
U. Huzaifa, C. Maguire, and A. LaViers, " Toward an expressive bipedal robot: variable gait synthesis and validation in a planar model,” arXiv: 1808.05594v2, 2018.

[11] 
T. Takenaka, T. Matsumoto, T. Yoshiike, T. Hasegawa, S. Shirokura, H. Kaneko, and A. Orita, " Real time motion generation and control for biped robot4th report: integrated balance control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, pp. 1601–1608, 2009.

[12] 
J. J. AlcarazJiménez, D. HerreroPérez, and H. MartínezBarberá, " Robust feedback control of ZMPbased gait for the humanoid robot Nao,” Int. J. Robot. Res, vol. 32, no. 9–10, pp. 1074–1088, 2013. doi: 10.1177/0278364913487566

[13] 
K. Seo, J. Kim, and K. Roh, " Towards natural bipedal walking: virtual gravity compensation and capture point control,” in Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), pp. 4019–4026, 2012.

[14] 
J. Kim, T.T. Tran, C.V. Dang, and B. Kang, " Motion and walking stabilization of humanoids using sensory reflex control,” Inter. J. Advanced Robotic Systems (IJARS)

[15] 
M. Shahbazi, G.A.D. Lopes, and R. Babuska, " Observerbased postural balance control for humanoid robots,” in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), pp. 891–896, 2013.

[16] 
S. Wang, W. Chavalitwongse, and B. Robert, " Machine learning algorithms in bipedal robot control,” IEEE Trans. System,Man,and Cybernetics:Part C Application and Reviews, vol. 42, no. 5, pp. 728–743, 2012. doi: 10.1109/TSMCC.2012.2186565

[17] 
I. Mordatch, N. Mishra, C. Eppenr, and P. Abbeel, " Combining modelbased policy search with online model learning for control of physical humanoids,” in Proc. Int. Conf. Robotics and Automation, pp. 242–248, 2016.

[18] 
P. Hénaff, V. Scesa, F.B. Ouezdou, and O. Bruneau, " Real time implementation of CTRNN and BPTT algorithm to learn online biped robot balance: Experiments on the standing posture,” Control Eng. Pract., vol. 19, no. 1, pp. 89–99, 2011. doi: 10.1016/j.conengprac.2010.10.002

[19] 
A. A. Saputra and I. A. Sulistijono, " Biologically inspired control system for 3D locomotion of a humanoid biped robot,” IEEE Trans. System,Man,and Cybernetics:Systems, vol. 46, no. 7, pp. 898–911, 2016. doi: 10.1109/TSMC.2015.2497250

[20] 
J. P. Ferreira, M. Crisostomo, and A.P. Coimbra, " Rejection of an external force in the sagittal plane applied on a biped robot using a neurofuzzy controller,” in Proc. Int. Conf. Advanced Robotics, pp. 1–6. IEEE, 2009.

[21] 
F. Farzadpour and M. Danesh, " A new hybrid intelligent control algorithm for a sevenlink biped walking robot,” J. Vibration and Control, vol. 20, no. 9, pp. 1378–1393, 2014. doi: 10.1177/1077546312470476

[22] 
K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Gait balance and acceleration of a biped robot based on Qlearning,” IEEE Access, vol. 4, pp. 2439–2449, Jun. 2016. doi: 10.1109/ACCESS.2016.2570255

[23] 
K. S. Hwang, W. C. Jiang, Y. J. Chen, and H. Shi, " Motion segmentation and balancing for a biped robot’s imitation learning,” IEEE Trans. Industrial Information, vol. 13, no. 3, Jun. 2017.

[24] 
W. Wu and L. Gao, " Posture selfstabilizer of a bipedal robot based on training platform and reinforcement learning,” J. Robotics and Autonomous Systems, vol. 98, pp. 42–55, 2017. doi: 10.1016/j.robot.2017.09.001

[25] 
A. S. Polydoros and L. Nalpantidis, " Survey of modelbased rinforcement learning: application on robotics,” J. intelligent Robotic Systems, vol. 86, pp. 153–173, 2017. doi: 10.1007/s108460170468y

[26] 
M. P. Deisenroth and C. E. Rasmussen, " PILCO: a modelbased and dataefficient approach to policy search,” in Proc. 28th Int. Conf. Machine Learning, pp. 465–472, 2011.

[27] 
M. Cutler and J. How, " Efficient reinforcement learning for robots using informative simulated priors,” in Proc. Int. Conf. Robotics and Automation, pp. 2605–2612, 2015.

[28] 
P. Englert, A. paraschos, J. Peters, and M. P. Deisenroth, " Modelbased imitation learning by probabilistic trajectory matching,” in Proc. Int. Conf. Robotics and Automation, pp. 1922–1927, 2013.

[29] 
Y. Gal, R. T. McAllister, and C. E. Rasmussen, " Improving PILCO with Bayesian Neural Network Dynamics Models,” Cambridge University, 2017.

[30] 
B. Stephens, " Integral control of humanoid balance,” in Proc. Int. Conf. Int. Robotics and Systems, pp. 4020–4027, 2007.

[31] 
J. Kober, J. A. Bagnell, and J. Peters, " Reinforcement learning in robotics: a survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013. doi: 10.1177/0278364913495721

[32] 
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, Cambridge, MA, USA, 2006.

[33] 
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, zond Edition (Chapman & Hall/CRC Texts in Statistical Science), 3rd Edition, Nov. 2013.

[34] 
D. J. C. Mackay, " Comparison of approximate methods for handling hyperparameters,” Neural Computation, vol. 11, no. 5, pp. 1035–1068, 1999. doi: 10.1162/089976699300016331

[35] 
M. Deisenroth, D. Fox, and C. E. Rasmussen, " Gaussian processes for dataefficient learning in robotics and control,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–423, 2015. doi: 10.1109/TPAMI.2013.218

[36] 
M. Deisenroth, G. Neumann, and J. Peters, " A Survey of policy search for robotics,” Foundations and Trends in Robotics, vol. 2, no. 1–2, pp. 1–142, 2013.

[37] 
C. J. C. H. Watkins, and P. Dayan, " Technical note Qlearning,” Machine Learning, vol. 8, pp. 279–292, 1992.

[38] 
R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, The MIT Press, USA, 2018.

[39] 
J. Feng, C. Fyfe, and L. C. Jain, " Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies,” J. Intelligence and Fuzzy System, vol. 20, no. 1–2, pp. 73–82, 2009.
