Reinforcement Learning-Based Adaptive Optimal Control for a Snake Robot

Yang Xiu; Zhiyi Shi; Guanghong Liu; Rob Law; Dongfang Li; Aiguo Song; Edmond Q. Wu

doi:10.1109/JAS.2025.125762

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > In Press, Accepted Manuscript

Y. Xiu, Z. Shi, G. Liu, R. Law, D. Li, A. Song, and E. Q. Wu, “Reinforcement learning-based adaptive optimal control for a snake robot,” IEEE/CAA J. Autom. Sinica, early access, 2026. doi: 10.1109/JAS.2025.125762

Citation:

Y. Xiu, Z. Shi, G. Liu, R. Law, D. Li, A. Song, and E. Q. Wu, “Reinforcement learning-based adaptive optimal control for a snake robot,” IEEE/CAA J. Autom. Sinica, early access, 2026. doi: 10.1109/JAS.2025.125762

Citation:

PDF( 6395 KB)

Reinforcement Learning-Based Adaptive Optimal Control for a Snake Robot

doi: 10.1109/JAS.2025.125762

Funds: This work was supported in part by the National Natural Science Foundation of China (62303117, T2325018, and 62171274), the Military Science and Technology Commission Science and Technology Innovation Project (C1692), and the Fujian Provincial Natural Science Foundation (2024J01278)

More Information

Author Bio:
Yang Xiu (Member, IEEE) received his B.E. degree from the College of Automation and Electronic Engineering, Qingdao University of Science and Technology in 2020. He is currently working toward a Ph.D. degree at the School of Mechatronical Engineering, Beijing Institute of Technology. His research interests include adaptive control, guidance control, and snake robots

Zhiyi Shi received the B.S. degree in the School of Automation from Nanjing University of Aeronautics and Astronautics in 2019. He is currently working toward the Ph.D. degree in the Department of Automation, Shanghai Jiao Tong University. His research interests include deep reinforcement learning, and intelligent manipulator control

Guanghong Liu received the B.E. degree in automation and Ph.D. degree in control theory and control engineering from the University of Science and Technology of China in 2006 and 2011, respectively. He is currently a Researcher with the Information Science Research Institute, China Electronics Technology Group Corporation, Beijing, China. His research focuses on the pathfollowing control of robots

Rob Law is the University of Macau Development Foundation (UMDF) Chair Professor of smart tourism. He is also an Honorary Professor of several other reputable universities. Prior to joining the University of Macau in July 2021, he has worked in industry organizations and academic institutes. He is an Active Researcher. He has received more than 110 research-related awards and accolades, as well as millions of USD external and internal research grants

Dongfang Li (Member, IEEE) received his B.E. degree in 2014 from Nanjing University of Aeronautics and Astronautics. He received his Ph.D. degree from the School of Mechanical Engineering, Beijing Institute of Technology in 2021. He is currently an Associate Professor with the School of Electrical Engineering and Automation, Fuzhou University. He is also currently doing post-doctoral work in Shanghai Jiao Tong University. His research interests include the path-following of snake robots, obstacle avoidance control of robots, and trajectory tracking control of an aircraft. He has published works on IEEE Transactions on Industrial Electronics, IEEE Transactions on Industrial Informatics, and IEEE Transactions on Systems, Man and Cybernetics: Systems

Aiguo Song (Senior Member, IEEE) received the B.S. degree in automatic control and the M.S. degree in measurement and control from the Nanjing University of Aeronautics and Astronautics, in 1990 and 1993, respectively, and the Ph.D. degree in measurement and control from Southeast University in 1996. From 1996 to 1998, he was a Professor with the Department of Instrument Science and Engineering, Southeast University. His research interests include haptic display, robot tactile sensor, and tele-rehabilitation robot

Edmond Q. Wu (Senior Member, IEEE) received his Ph.D. degree in Control theory and application from Southeast University in 2009. He is a Full Professor at the Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University. His research interests include aeronautical neurocognitive science and aircraft control system. Dr. Wu is currently an Associate Editor of the IEEE Transactions on Intelligent Transportation Systems, IEEE Transactions on Intelligent Vehicles, and IEEE Transactions on Systems, Man, and Cybernetics: Systems
Corresponding author: Dongfang Li, e-mail: lidongfang@fzu.edu.cn
Received Date: 2024-10-20
Revised Date: 2025-01-25
Accepted Date: 2025-07-27

Available Online: 2026-03-05

Abstract

Abstract

Due to the difficulty of accurately modeling snake robots, model-based control schemes are ineffective, and the constraints of motion velocity and energy consumption pose challenges to meandering gait. In this work, a two-layer reinforcement learning-based adaptive optimal control framework for snake robots is proposed to achieve trajectory tracking motion of optimal energy efficiency gait. A multi-objective problem for gait amplitude, frequency, and phase is established in the optimization layer, which balances minimizing energy consumption and maximizing velocity by weighted summation. Multiple matching results of gait parameters and performance are obtained through proximal policy optimization, allowing users to select the optimal combination. In the control layer, an actor-critic-identifier neural network-based reinforcement learning optimal controller is designed by considering the difficulty in solving dynamics unknowns and Bellman equation. It adaptively fits the cost function and control policy, reducing the dependence on an accurate model and avoiding computational complexity. Theoretical analysis demonstrates that the proposed method can guarantee stability of tracking errors for snake robots, with optimal cost. Comparative simulation experiment results show the effectiveness and superiority of this method.
- Adaptive control,
- optimal control,
- reinforcement learning,
- snake robots

FullText(HTML)

References(30)

References

[1]	D. Li, B. Zhang, Y. Xiu, H. Deng, M. Zhang, W. Tong, R. Law, G. Zhu, E. Wu, and L. Zhu, “Snake robots play an important role in social services and military needs,” Innovation, vol. 3, no. 6, p. 100333, Nov. 2022.
[2]	J. Seetohul and M. Shafiee, “Snake robots for surgical applications: A review,” Robotics, vol. 11, no. 3, p. 57, May 2022. doi: 10.3390/robotics11030057
[3]	D. Li, L. Zeng, Y. Xiu, Z. Pan, D. Zhang, and H. Deng, “Sideslip elimination and coefficient approximation based trajectory tracking control for snake robots,” IEEE Trans. Ind. Inf., vol. 19, no. 8, pp. 8754–8764, Aug. 2023. doi: 10.1109/TII.2022.3220846
[4]	D. Li, B. Zhang, P. Li, E. Wu, R. Law, X. Xu, A. Song, and L. Zhu, “Parameter estimation and anti-sideslip line-of-sight method-based adaptive path-following controller for a multijoint snake robot,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 53, no. 8, pp. 4776–4788, Aug. 2023. doi: 10.1109/TSMC.2023.3256383
[5]	P. Liljebäck, K. Y. Pettersen, Ø. Stavdahl, and J. T. Gravdahl, “A simplified model of planar snake robot locomotion,” in Proc. IEEE-RSJ Int. Conf. Intelligent Robots and Systems, Taipei, China, 2010, pp. 2868–2875.
[6]	G. Wang, W. Yang, Y. Shen, H. Shao, and C. Wang, “Adaptive path following of underactuated snake robot on unknown and varied frictions ground: Theory and validations,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 4273–4280, Oct. 2018. doi: 10.1109/LRA.2018.2864602
[7]	E. Kelasidi, K. Y. Pettersen, and J. T. Gravdahl, “Energy efficiency of underwater snake robot locomotion,” in Proc. 23rd Mediterranean Conf. Control and Automation, Torremolinos, Spain, 2015, pp. 1124−1131.
[8]	B. Xu, M. Jiao, X. Zhang, and D. Zhang, “Path tracking of an underwater snake robot and locomotion efficiency optimization based on improved pigeon-inspired algorithm,” J. Mar. Sci. Eng., vol. 10, no. 1, p. 47, Jan. 2022. doi: 10.3390/jmse10010047
[9]	D. Zhang, H. Yuan, and Z. Cao, “Environmental adaptive control of a snake-like robot with variable stiffness actuators,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 745–751, May 2020. doi: 10.1109/JAS.2020.1003144
[10]	P. Liljeback, I. U. Haugstuen, and K. Y. Pettersen, “Path following control of planar snake robots using a cascaded approach,” IEEE Trans. Control Syst. Technol., vol. 20, no. 1, pp. 111–126, Jan. 2012.
[11]	W. Yang, G. Wang, H. Shao, and Y. Shen, “Spline based curve path following of underactuated snake robots,” in Proc. Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 5352−5358.
[12]	D. Li, Z. Pan, H. Deng, and L. Hu, “Adaptive path following controller of a multijoint snake robot based on the improved serpenoid curve,” IEEE Trans. Ind. Electron., vol. 69, no. 4, pp. 3831–3842, Apr. 2022. doi: 10.1109/TIE.2021.3075851
[13]	D. Li, Y. Zhang, W. Tong, P. Li, R. Law, X. Xu, L. Zhu, and E. Wu, “Anti-disturbance path-following control for snake robots with spiral motion,” IEEE Trans. Ind. Inf., vol. 19, no. 12, pp. 11929–11940, Dec. 2023. doi: 10.1109/TII.2023.3254534
[14]	Y. Xiu, D. Li, M. Zhang, H. Deng, R. Law, Y. Huang, E. Q. Wu, and X. Xu, “Finite-time sideslip differentiator-based LOS guidance for robust path following of snake robots,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 239–253, Jan. 2023. doi: 10.1109/JAS.2022.106052
[15]	H. Fukushima, T. Yanagiya, Y. Ota, M. Katsumoto, and F. Matsuno, “Model predictive path-following control of snake robots using an averaged model,” IEEE Trans. Control Syst. Technol., vol. 29, no. 6, pp. 2444–2456, Nov. 2021. doi: 10.1109/TCST.2020.3043446
[16]	H. Li, Q. Zhang, and D. Zhao, “Deep reinforcement learning-based automatic exploration for navigation in unknown environment,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 6, pp. 2064–2076, Jun. 2020. doi: 10.1109/TNNLS.2019.2927869
[17]	H. Shen, Y. Wang, J. Wang, and J. H. Park, “A fuzzy-model-based approach to optimal control for nonlinear Markov jump singularly perturbed systems: A novel integral reinforcement learning scheme,” IEEE Trans. Fuzzy Syst., vol. 31, no. 10, pp. 3734–3740, Oct. 2023. doi: 10.1109/TFUZZ.2023.3265666
[18]	D. Li, B. Zhang, R. Law, E. Q. Wu, and X. Xu, “Error constrained-formation path-following method with disturbance elimination for multisnake robots,” IEEE Trans. Ind. Electron., vol. 71, no. 5, pp. 4987–4998, May 2024. doi: 10.1109/TIE.2023.3288202
[19]	J. Mukherjee, S. Roy, I. N. Kar, and S. Mukherjee, “Maneuvering control of planar snake robot: An adaptive robust approach with artificial time delay,” Int. J. Robust Nonlinear Control, vol. 31, no. 9, pp. 3982–3999, Mar. 2021. doi: 10.1002/rnc.5430
[20]	D. Li, Y. Zhang, P. Li, R. Law, Z. Xiang, X. Xu, L. Zhu, and E. Wu, “Position errors and interference prediction-based trajectory tracking for snake robots,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1810–1821, Sep. 2023. doi: 10.1109/JAS.2023.123612
[21]	D. Li, J. Zhou, Y. Huang, D. Zhang, P. Li, and A. Song, “Integral line of sight guidance scheme-based tracking method for snake robots,” IEEE Trans. Autom. Sci. Eng., vol. 22, pp. 4537–4547, 2025. doi: 10.1109/TASE.2023.3327958
[22]	L. Chen, C. Dong, and S.-L. Dai, “Reinforcement learning-based finite-time optimal containment control for underactuated surface vehicles with guaranteed performance,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 54, no. 12, pp. 7206–7217, Dec. 2024. doi: 10.1109/TSMC.2024.3449343
[23]	S. Zuo, Y. Song, F. L. Lewis, and A. Davoudi, “Optimal robust output containment of unknown heterogeneous multiagent system using off-policy reinforcement learning,” IEEE Trans. Cybern., vol. 48, no. 11, pp. 3197–3207, Nov. 2018. doi: 10.1109/TCYB.2017.2761878
[24]	Y. Xiu, D. Li, H. Deng, S. Jiang, and E. Q. Wu, “Path-following based on fuzzy line-of-sight guidance for a bionic snake robot with unknowns,” IEEE/ASME Trans. Mechatron., vol. 28, no. 6, pp. 3167–3179, Dec. 2023. doi: 10.1109/TMECH.2023.3254817
[25]	Y. Xiu, Y. Zhang, H. Deng, H. Li, and Y. Xu, “Collaborative line-of-sight guidance-based robust formation control for a multi-snake robot,” IEEE Trans. Autom. Sci. Eng., vol. 22, pp. 4514–4524, 2025. doi: 10.1109/TASE.2023.3348469
[26]	C. Wang, Y. Shi, Y. Wang, S. Xu, and M. Liang, “Event-triggered adaptive fuzzy output feedback tracking control for pneumatic servo system with input voltage saturation and position constraint,” IEEE Trans. Ind. Inf., vol. 20, no. 3, pp. 4360–4369, Mar. 2024. doi: 10.1109/TII.2023.3316222
[27]	X.-Q. Cai, P. Zhang, L. Zhao, J. Bian, M. Sugiyama, and A. J. Llorens, “Distributional Pareto-optimal multi-objective reinforcement learning,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2024, pp. 686.
[28]	J. Wu, J. Zhang, B. Nie, Y. Liu, and X. He, “Adaptive control of PMSM servo system for steering-by-wire system with disturbances observation,” IEEE Trans. Transp. Electrif., vol. 8, no. 2, pp. 2015–2028, Jun. 2022. doi: 10.1109/TTE.2021.3128429
[29]	H. K. Khalil, Nonlinear Systems. 3rd ed. Upper Saddle River, USA: Prentice Hall, 2002.
[30]	J. Zhao, Y. Lv, Z. Zhao, and Z. Wang, “Adaptive optimal tracking control of servo mechanisms via generalized policy learning,” IEEE Trans. Instrum. Meas., vol. 73, p. 3002311, Sep. 2024.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(21) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (20) PDF downloads(5)

Reinforcement Learning-Based Adaptive Optimal Control for a Snake Robot

doi: 10.1109/JAS.2025.125762

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content