Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge

Lan Jiang; Hongyun Huang; Zuohua Ding

doi:10.1109/JAS.2019.1911732

Volume 7 Issue 4

Jun. 2020

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2020 > 7(4): 1179-1189

Lan Jiang, Hongyun Huang and Zuohua Ding, "Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179-1189, July 2020. doi: 10.1109/JAS.2019.1911732

Citation:

Lan Jiang, Hongyun Huang and Zuohua Ding, "Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179-1189, July 2020. doi: 10.1109/JAS.2019.1911732

Citation:

PDF( 1063 KB)

Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge

doi: 10.1109/JAS.2019.1911732

Funds: This work was supported by the National Natural Science Foundation of China (61751210, 61572441)

More Information

Author Bio:
Lan Jiang is a master student with the School of Information Science, Zhejiang Sci-Tech University. Her research interests include machine learning and data analysis

Hongyun Huang received the M.S. degree in software engineering from Zhejiang Sci-Tech University in 2015, and the bachelor degree in business administration from Shanghai Jiao Tong University in 2009. She is currently a Data Analyzer at the Center of Multi-Media Big Data of Library, Zhejiang Sci-Tech university. Her research interests include data analysis, system modeling, AI computing. She has authored and coauthored over 6 papers on the above areas

Zuohua Ding (M’11) received the M.S. degree in computer science and the Ph.D. degree in mathematics, both from the University of South Florida, Tampa, FL, USA, in 1996 and 1998, respectively. From 1998 to 2001, he was a Senior Software Engineer with Advanced Fiber Communication, Petaluma, CA, USA. He has been a Research Professor with the National Institute for Systems Test and Productivity, Vail, CO, USA, since 2001. He is currently a Professor and the Director with the Laboratory of Intelligent Computing and Software Engineering, Zhejiang Sci-Tech University, Hangzhou, China. His research interests include system modeling, program analysis, service computing, software reliability prediction, and Petri nets. He has authored and coauthored over 70 papers
Corresponding author: L. Jiang and Z. H. Ding are with the Laboratory of Intelligent Computing and Software Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China (e-mail: jianglander@163.com; zouhuading@hotmail.com); H. Y. Huang is with the Center of Multi-Media Big Data of Library, Zhejiang Sci-Tech University, Hangzhou 310018, China (e-mail: huanghongyun07@hotmail.com)
Received Date: 2018-11-01
Revised Date: 2018-12-28
Accepted Date: 2019-01-24

Available Online: 2019-05-17

Abstract

Abstract

Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the “curse of dimensionality” issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network; such a process is called experience replay. Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
- Deep Q-learning (DQL),
- experience replay (ER),
- heuristic knowledge (HK),
- path planning

FullText(HTML)

References(33)

References

[1]	M. Liu, F. Colas, F. Pomerleau, and R. Siegwart, “A Markov semisupervised clustering approach and its application in topological map extraction,” in Proc. IEEE Int. Conf. Intelligent Robots and Systems, Vilamoura, Algarve, 2012, pp. 4743–4748.
[2]	M. Xu, L. Jaesung, and K. Bo-Yeong, “Scalable coverage path planning for cleaning robots using rectangular map decomposition on large environments,” IEEE Access, 2018, pp. 1–1.
[3]	B. Sandeep and P. Supriya, “Analysis of fuzzy rules for robot path planning,” in Proc. Int. Conf. Advances in Computing, Communications and Informatics, Jaipur, India, 2016, pp. 309–314.
[4]	N. Y. Zeng, H. Zhang, Y. P. Chen, B. Q. Chen, and Y. R. Liu, “Path planning for intelligent robot based on switching local evolutionary PSO algorithm,” Assembly Autom., vol. 36, no. 2, pp. 120–126, Apr. 2016. doi: 10.1108/AA-10-2015-079
[5]	E. Masehian and D. Sedighizadeh, “A multi-objective PSO-based algorithm for robot path planning,” in Proc. IEEE Int. Conf. Industrial Technology, Vina del Mar, Chile, Apr. 2010, pp. 465–470.
[6]	H. Shuyun, S. Tang, B. Song, M. Tong, and M. Ji, “Robot path planning based on improved ant colony optimization,” Computer Engineering, vol. 34, no. 15, pp. 1–3, 2008.
[7]	B. Farid, G. Denis, P. Herve, and G. Dominique, “Modified artificial potential field method for online path planning applications,” in Proc. IEEE Intelligent Vehicles Symposium (IV), Redondo Beach, California, USA, 2017, pp. 180–185.
[8]	L. F. Liu, R. X. Shi, S. D. Li, and W. Jiang, “Path planning for UAVS based on improved artificial potential field method through changing the repulsive potential function,” in Proc. IEEE Chinese Guidance, Navigation and Control Conf. (CGNCC), Nanjing, China, 2016, pp. 2011–2015.
[9]	Y. W. Chen and W. Y. Chiu, “Optimal robot path planning system by using a neural network-based approach,” in Proc. Autom. Control Conf., Taichung, China, 2016, pp. 85–90.
[10]	J. L. Zhang, J. Y. Zhang, Z. Ma, and Z. Z. He, “Using partial-policy q-learning to plan path for robot navigation in unknown enviroment,” in Proc. 10th Int. Symposium on Computational Intelligence and Design (ISCID), vol. 1, pp. 85–90, Dec., 2017.
[11]	V. Babu, U. Krishna, and S. Shahensha, “An autonomous path finding robot using q-learning,” in Proc. Int. Conf. Intelligent Systems and Control, Coimbatore, India: IEEE, Jan. 2016, pp. 1–6.
[12]	Z. F. Wu, “Application of optimized q learning algorithm in reinforcement learning,” Bulletin of Science&Technology, vol. 36, no. 2, pp. 74–76, Feb. 2018.
[13]	L. J. Lin, “Reinforcement learning for robots using neural networks,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, USA, 1993.
[14]	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with deep reinforcement learning,” Computer Science, Dec. 2013.
[15]	L. Tong, “A speedup convergent method for multi-agent reinforcement learning,” in Proc. Int. Conf. Information Engineering and Computer Science, Dec. 2009, pp. 1–4.
[16]	J. Kober, J. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” Int. J. Robotics Research, vol. 32, no. 11, pp. 1238–1274, Sep. 2013. doi: 10.1177/0278364913495721
[17]	Y. Gao, S. F. Chen, and X. Lu, “Research on reinforcement learning technology: A review,” Acta Autom. Sinica, vol. 30, no. 1, pp. 86–100, Jan. 2004.
[18]	L. Xue, C. Y. Sun, D. Wunsch, Y. J. Zhou, and Y. Fang, “An adaptive strategy via reinforcement learning for the prisoner’s dilemma game,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301–310, 2018.
[19]	T. Liu, B. Tian, Y. F. Ai, L. Li, D. P. Cao, and F.-Y. Wang, “Parallel reinforcement learning: A framework and case study,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827–835, 2018. doi: 10.1109/JAS.2018.7511144
[20]	Y. Yusof, H. M. A. H. Mansor, and H. M. D. Baba, “Simulation of mobile robot navigation utilizing reinforcement and unsupervised weightless neural network learning algorithm,” in Proc. Research and Development, 2016, pp. 123–128.
[21]	L. Li, Y. S. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254, 2016. doi: 10.1109/JAS.2016.7508798
[22]	A. Sharma, K. Gupta, A. Kumar, A. Sharma, and R. Kumar, “Model based path planning using Q-learning,” in Proc. IEEE Int. Conf. Industrial Technology, Chengdu, China, 2017, pp. 837–842.
[23]	S. Parasuraman and S. C. Yun, “Mobile robot navigation: Neural qlearning,” Int. J. Computer Applications in Technology, vol. 44, no. 4, pp. 303–311, Oct. 2013.
[24]	A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Int. Conf. Neural Information Processing Systems, Doha, Qatar, 2012, pp. 1097–1105.
[25]	L. Tai and M. Liu, “A robot exploration strategy based on q-learning network,” in Proc. IEEE Int. Conf. Real-Time Computing and Robotics, Angkor Wat, Cambodia, 2016, pp. 57–62.
[26]	M. Jaradat, M. Al-Rousan, and L. Quadan, “Reinforcement based mobile robot navigation in dynamic environment,” Robotics&Computer Integrated Manufacturing, vol. 27, no. 1, pp. 135–149, Feb. 2011.
[27]	L. Cherroun and M. Boumehraz, “Intelligent systems based on reinforcement learning and fuzzy logic approaches, ‘application to mobile robotic’,” in Proc. Int. Conf. Information Technology and E-Services, 2012, pp. 1–6.
[28]	H. Boubertakh, M. Tadjine, and P. Y. Glorennec, “A new mobile robot navigation method using fuzzy logic and a modified q-learning algorithm,” J. Intelligent&Fuzzy Systems, vol. 21, no. 1–2, pp. 113–119, 2010.
[29]	J. Liu, W. Qi, and X. Lu, “Multi-step reinforcement learning algorithm of mobile robot path planning based on virtual potential field,” Springer, vol. 728, pp. 528–538, Sep. 2017.
[30]	Y. B. Zheng, B. Li, D. Y. An, and N. Li, “A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,” in Proc. Int. Conf. Natural Computation, 2016, pp. 363–369.
[31]	D. Luviano Cruz and W. Yu, “Multi-agent path planning in unknown environment with reinforcement learning and neural network,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, San Diego, USA, 2014, pp. 3458–3463.
[32]	Y. J. Zhao, Z. Zheng, X. Y. Zhang, and Y. Liu, “Q learning algorithm based UAV path learning and obstacle avoidence approach,” in Proc. Chinese Control Conf. (Int.) (CCC), Dalian, China, 2017, pp. 3397–3402.
[33]	B. Q. Huang, G. Y. Cao, and M. Guo, “Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance,” in Proc. Int. Conf. Machine Learning and Cybernetics, Guangzhou, China, 2005, pp. 85–89.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views (3005) PDF downloads(213)

Highlights

Fast convergence and Better strategy
The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
Deep Q-learning
We use DQL to process the state information of intelligent robots, and get the cumulative reward value of the corresponding action, so as to replace Q-table in reinforcement learning and solve the “curse of dimensionality”.
Experience replay
Training a neural network requires a lot of data, but when the robot explores in a unknown environment, it is impossible to prepare enough training sample sets for it in advance. We make the robot collect experience data that are generated during its moving and store them in replay memory . Then, we use the data in the replay memory to train the neural network.
Heuristic knowledge
On the one hand, heuristic knowledge guides the behavior of the robot, and it makes the actions selected by the robot more purposeful; on the other hand, it also increases the effectiveness of training the neural networks. With the help of heuristic knowledge, neural networks will converge to an optimal action strategy faster.

Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge

doi: 10.1109/JAS.2019.1911732

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content