Safe Efficient Policy Optimization Algorithm for Unsignalized Intersection Navigation

Xiaolong Chen; Biao Xu; Manjiang Hu; Yougang Bian; Yang Li; Xin Xu

doi:10.1109/JAS.2024.124287

Volume 11 Issue 9

Sep. 2024

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2024 > 11(9): 2011-2026

X. Chen, B. Xu, M. Hu, Y. Bian, Y. Li, and X. Xu, “Safe efficient policy optimization algorithm for unsignalized intersection navigation,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 2011–2026, Sept. 2024. doi: 10.1109/JAS.2024.124287

Citation:

X. Chen, B. Xu, M. Hu, Y. Bian, Y. Li, and X. Xu, “Safe efficient policy optimization algorithm for unsignalized intersection navigation,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 2011–2026, Sept. 2024. doi: 10.1109/JAS.2024.124287

Citation:

PDF( 4931 KB)

Safe Efficient Policy Optimization Algorithm for Unsignalized Intersection Navigation

doi: 10.1109/JAS.2024.124287

Funds: This work was supported by the National Natural Science Foundation of China (52102394, 52172384), Hunan Provincial Natural Science Foundation of China (2023JJ10008), and Young Elite Scientists Sponsorship Program by CAST (2022QNRC001)

More Information

Author Bio:
Xiaolong Chen received the B.E. degree from Fuzhou University in 2016 and the Ph.D. degree from Hunan University in 2023, respectively. His research interests include cooperative control methods for connected and automated vehicles, single-agent and multi-agent reinforcement learning, as well as decision control for signal-free intersections. He has served as a Peer Reviewer for several prestigious journals such as IEEE Transactions on Intelligent Transportation Systems, IEEE/ASME Transactions on Mechatronics, IEEE Transactions on Computing, IEEE Internet of Things Journal, and IEEE Transactions on Vehicular Technology

Biao Xu (Member, IEEE) received the B.E. degree and the Ph.D. degree from Tsinghua University, in 2013 and 2018, respectively. From 2016 to 2017, he was a Visiting Scholar with University of Washington, USA. He is currently an Associate Research Fellow at the College of Mechanical and Vehicle Engineering, Hunan University. His research interests include connected and automated vehicles, vehicle control, and V2I cooperation. Dr. Xu was the recipient of Best Paper Award in 14th Intelligent Transportation Systems Asia-Pacific Forum in 2015, and Best Paper Award in the 2017 IEEE Intelligent Vehicle Symposium

Manjiang Hu received the B.Tech. degree and the Ph.D. degree from Jiangsu University, in 2009 and 2014, respectively. He worked as a Postdoctor in the Department of Automotive Engineering, Tsinghua University from 2014 to 2017. He is currently a Professor with the College of Mechanical and Vehicle Engineering, Hunan University. His research interests include cooperative driving assistance technology and vehicle control

Yougang Bian (Member, IEEE) received the B.E. and Ph.D. degrees from Tsinghua University in 2014 and 2019, respectively. He was a Visiting Scholar with the Department of Electrical and Computer Engineering, University of California at Riverside, USA, from 2017 to 2018. He is currently an Associate Professor at Hunan University. His research interests include distributed control, cooperative control, and their applications to connected and automated vehicles. Dr. Bian was a recipient of the Best Paper Award at the 2017 IEEE Intelligent Vehicles Symposium

Yang Li received the B.E. degree from Chongqing University in 2014 and the Ph.D. degree from Tsinghua University in 2020, respectively. She is now working as an Assistant Professor with Hunan University. Her current research interests include deep reinforcement learning, multi-agent reinforcement learning, safe reinforcement learning, and transfer learning in the decision-making of automated vehicles

Xin Xu (Senior Member, IEEE) received the Ph.D. degree in control science and engineering from the College of Mechatronics and Automation, National University of Defense Technology (NUDT) in 2002. He has been a Visiting Professor with Hong Kong Polytechnic University, Hong Kong, China; the University of Alberta, Canada, University of Guelph, Canada, and the University of Strathclyde, U.K. He is currently a Professor with the College of Intelligence Science and Technology, NUDT. His research interests include intelligent control, reinforcement learning, approximate dynamic programming, machine learning, robotics, and autonomous vehicles. Dr. Xu was the recipient of the National Science Fund for Outstanding Youth in China and the Second-Class National Natural Science Award of China. He has served as an Associate Editor or a Guest Editor for Information Sciences, International Journal of Robotics and Automation, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Intelligent Automation and Soft Computing, International Journal of Adaptive Control and Signal Processing, and Acta Automatica Sinica. He is a Member of the IEEE CIS Technical Committee on Approximate Dynamic Programming and Reinforcement Learning and the IEEE RAS Technical Committee on Robot Learning
Corresponding author: Biao Xu, e-mail: xubiao@hnu.edu.cn
Received Date: 2023-07-18
Revised Date: 2023-12-25
Accepted Date: 2024-01-30

Available Online: 2024-05-06

Abstract

Abstract

Unsignalized intersections pose a challenge for autonomous vehicles that must decide how to navigate them safely and efficiently. This paper proposes a reinforcement learning (RL) method for autonomous vehicles to navigate unsignalized intersections safely and efficiently. The method uses a semantic scene representation to handle variable numbers of vehicles and a universal reward function to facilitate stable learning. A collision risk function is designed to penalize unsafe actions and guide the agent to avoid them. A scalable policy optimization algorithm is introduced to improve data efficiency and safety for vehicle learning at intersections. The algorithm employs experience replay to overcome the on-policy limitation of proximal policy optimization and incorporates the collision risk constraint into the policy optimization problem. The proposed safe RL algorithm can balance the trade-off between vehicle traffic safety and policy learning efficiency. Simulated intersection scenarios with different traffic situations are used to test the algorithm and demonstrate its high success rates and low collision rates under different traffic conditions. The algorithm shows the potential of RL for enhancing the safety and reliability of autonomous driving systems at unsignalized intersections.
- Autonomous driving,
- decision-making,
- reinforcement learning (RL),
- unsignalized intersection

FullText(HTML)

References(40)

References

[1]	W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision-making for autonomous vehicles,” Annu. Rev. Control Rob. Auton. Syst., vol. 1, no. 1, pp. 187–210, 2018. doi: 10.1146/annurev-control-060117-105157
[2]	S. Mariani, G. Cabri, and F. Zambonelli, “Coordination of autonomous vehicles,” ACM Comput. Surv., vol. 54, no. 1, pp. 1–33, Jan. 2022.
[3]	K. Dresner and P. Stone, “A multiagent approach to autonomous intersection management,” J. Artif. Intell. Res., vol. 31, pp. 591–656, Mar. 2008. doi: 10.1613/jair.2502
[4]	X. Chen, M. Hu, B. Xu, Y. Bian, and H. Qin, “Improved reservation-based method with controllable gap strategy for vehicle coordination at non-signalized intersections,” Physica A, vol. 604, p. 127953, Oct. 2022. doi: 10.1016/j.physa.2022.127953
[5]	R. Tian, N. Li, I. Kolmanovsky, Y. Yildiz, and A. R. Girard, “Game-theoretic modeling of traffic in unsignalized intersection network for autonomous vehicle control verification and validation,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2211–2226, Mar. 2022. doi: 10.1109/TITS.2020.3035363
[6]	N. Li, Y. Yao, I. Kolmanovsky, E. Atkins, and A. R. Girard, “Game-theoretic modeling of multi-vehicle interactions at uncontrolled intersections,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 2, pp. 1428–1442, Feb. 2022. doi: 10.1109/TITS.2020.3026160
[7]	S. Yan, T. Welschehold, D. Buscher, and W. Burgard, “Courteous behavior of automated vehicles at unsignalized intersections via reinforcement learning,” IEEE Robot. Autom. Lett., vol. 7, no. 1, pp. 191–198, Oct. 2021.
[8]	N. Parvez Farazi, B. Zou, T. Ahamed, and L. Barua, “Deep reinforcement learning in transportation research: A review,” Transp. Res. Interdiscip. Persp., vol. 11, p. 100425, Sept. 2021.
[9]	F. Azadi, N. Mitrovic, and A. Stevanovic, “Impact of shared lanes on performance of the combined flexible lane assignment and reservation-based intersection control,” Transp. Res. Rec., vol. 2676, no. 12, pp. 51–68, Dec. 2021.
[10]	Z. Guo, D. Sun, and L. Zhou, “Game algorithm of intelligent driving vehicle based on left-turn scene of crossroad traffic flow,” Comput. Intell. Neurosci., vol. 2022, p. e9318475, Sept. 2022.
[11]	P. Hang, C. Lv, and X. Chen, “Human-like decision making for autonomous vehicles with noncooperative game theoretic method,” Human-Like Decision Making and Control for Autonomous Driving. CRC Press, 2022.
[12]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
[13]	D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, Jan. 2019. doi: 10.1109/JAS.2018.7511249
[14]	Z. Zhu and H. Zhao, “A survey of deep rl and il for autonomous driving policy learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, p. 14, Sept. 2022.
[15]	Y. Zhang, B. Gao, L. Guo, H. Guo, and H. Chen, “Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5526–5538, Dec. 2021. doi: 10.1109/TNNLS.2020.3042981
[16]	J. Wu, Z. Huang, W. Huang, and C. Lv, “Prioritized experience-based reinforcement learning with human guidance for autonomous driving,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 1, pp. 855–869, Jan. 2024. doi: 10.1109/TNNLS.2022.3177685
[17]	Y. Wang, S. Hou, and X. Wang, “Reinforcement learning-based birdview automated vehicle control to avoid crossing traffic,” Comput.-Aided Civ. Infrastruct. Eng., vol. 36, no. 7, pp. 890–901, Jul. 2021. doi: 10.1111/mice.12572
[18]	C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5379–5391, Dec. 2021. doi: 10.1109/TNNLS.2021.3109284
[19]	H. Shu, T. Liu, X. Mu, and D. Cao, “Driving tasks transfer using deep reinforcement learning for decision-making of autonomous vehicles in unsignalized intersection,” IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 41–52, Jan. 2022. doi: 10.1109/TVT.2021.3121985
[20]	Z. Qiao, K. Muelling, J. Dolan, P. Palanisamy, and P. Mudalige, “POMDP and hierarchical options MDP with continuous actions for autonomous driving at intersections,” in Proc. 21st Int. Conf. Intelligent Transportation Systems, Nov. 2018, pp. 2377–2382.
[21]	Y. Ren, J. Duan, S. E. Li, Y. Guan, and Q. Sun, “Improving generalization of reinforcement learning with minimax distributional soft actor-critic,” in Proc. IEEE 23rd Int. Conf. Intelligent Transportation Systems. IEEE, 2020, pp. 1–6.
[22]	H. Seong, C. Jung, S. Lee, and D. H. Shim, “Learning to drive at unsignalized intersections using attention-based deep reinforcement learning,” in Proc. IEEE Int. Intelligent Transportation Systems Conf.. Indianapolis, USA: Oct. 2021, pp. 559–566.
[23]	M. Martinson, A. Skrynnik, and A. I. Panov, “Navigating autonomous vehicle at the road intersection simulator with reinforcement learning,” in Proc. Russian Conf. Artificial Intelligence, ser. Lecture Notes in Computer Science, S. O. Kuznetsov, A. I. Panov, and K. S. Yakovlev, Eds. Cham: Springer, 2020, pp. 71–84.
[24]	R. Bautista-Montesano, R. Galluzzi, K. Ruan, Y. Fu, and X. Di, “Autonomous navigation at unsignalized intersections: A coupled reinforcement learning and model predictive control approach,” Transp. Res. Part C Emerg. Technol., vol. 139, p. 103662, Jun. 2022. doi: 10.1016/j.trc.2022.103662
[25]	E. Candela, O. Doustaly, L. Parada, F. Feng, Y. Demiris, and P. Angeloudis, “Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning,” Artif. Intell., vol. 320, p. 103923, Jul. 2023. doi: 10.1016/j.artint.2023.103923
[26]	M. Selim, A. Alanwar, M. W. El-Kharashi, H. M. Abbas, and K. H. Johansson, “Safe reinforcement learning using data-driven predictive control,” in Proc. 5th Int. Conf. Communications, Signal Processing, and Their Applications, Dec. 2022, pp. 1–6.
[27]	J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347 [cs], Jul. 2017.
[28]	L. Zhang, R. Zhang, T. Wu, R. Weng, M. Han, and Y. Zhao, “Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5435–5444, Jul. 2021. doi: 10.1109/TNNLS.2021.3084685
[29]	T. De Ceunynck, E. Polders, S. Daniels, E. Hermans, T. Brijs, and G. Wets, “Road safety differences between priority-controlled intersections and right-hand priority intersections: behavioral analysis of vehicle-vehicle interactions,” Transp. Res. Rec., vol. 2365, no. 1, pp. 39–48, Jan. 2013. doi: 10.3141/2365-06
[30]	P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Flötteröd, R. Hilbrich, L. Lucken, J. Rummel, P. Wagner, and E. Wiessner, “Microscopic traffic simulation using SUMO,” in Proc. 21st Int. Conf. Intelligent Transportation Systems, Nov. 2018, pp. 2575–2582.
[31]	J. Song, Y. Wu, Z. Xu, and X. Lin, “Research on car-following model based on SUMO,” in Proc. IEEE/Int. Conf. Advanced Infocomm Technology, Nov. 2014, pp. 47–55.
[32]	H. Li, G. Yu, B. Zhou, P. Chen, Y. Liao, and D. Li, “Semantic-level maneuver sampling and trajectory planning for on-road autonomous driving in dynamic scenarios,” IEEE Trans. Veh. Technol., vol. 70, no. 2, pp. 1122–1134, Feb. 2021. doi: 10.1109/TVT.2021.3051178
[33]	Y. Qin, W. Hua, J. Jin, J. Ge, X. Dai, L. Li, X. Wang, and F.-Y. Wang, “AUTOSIM: Automated urban traffic operation simulation via meta-learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1871–1881, 2023. doi: 10.1109/JAS.2023.123780
[34]	R. K. R. Pallavali, “Synchronous intelligent intersections for sustainable urban mobility,” Ph.D. dissertation, Universidade do Porto, Portugal, 2023.
[35]	S. Das and A. K. Maurya, “Defining time-to-collision thresholds by the type of lead vehicle in non-lane-based traffic environments,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 12, pp. 4972–4982, Dec. 2020. doi: 10.1109/TITS.2019.2946001
[36]	J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proc. 34th Int. Conf. Machine Learning. PMLR, Jul. 2017, pp. 22–31.
[37]	J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in Proc. 32nd Int. Conf. Machine Learning. PMLR, Jun. 2015, pp. 1889–1897.
[38]	T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge, “Projection-based constrained policy optimization,” in Proc.8th Int. Conf. Learning Representations, Apr. 2020.
[39]	G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI Gym, Jun. 2016.
[40]	A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(12) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views (524) PDF downloads(94)

Highlights

Proposed a scalable RL algorithm for safe, efficient navigation at unsignalized intersections
Introduced a semantic scene graph for variable vehicle scenarios and stable learning
Developed a collision risk function to penalize unsafe actions and ensure safety
Demonstrated high success and low collision rates in diverse simulated traffic conditions
Enhanced balance between traffic safety and learning efficiency in autonomous driving

Safe Efficient Policy Optimization Algorithm for Unsignalized Intersection Navigation

doi: 10.1109/JAS.2024.124287

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content