A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
X. Chen, B. Xu, M. Hu, Y. Bian, Y. Li, and X. Xu, “Safe efficient policy optimization algorithm for unsignalized intersection navigation,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1–16, Sept. 2024. doi: 10.1109/JAS.2024.124287
Citation: X. Chen, B. Xu, M. Hu, Y. Bian, Y. Li, and X. Xu, “Safe efficient policy optimization algorithm for unsignalized intersection navigation,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1–16, Sept. 2024. doi: 10.1109/JAS.2024.124287

Safe Efficient Policy Optimization Algorithm for Unsignalized Intersection Navigation

doi: 10.1109/JAS.2024.124287
Funds:  This work was supported by the National Natural Science Foundation of China (52102394, 52172384), Hunan Provincial Natural Science Foundation of China (2023JJ10008), and Young Elite Scientists Sponsorship Program by CAST (2022QNRC001)
More Information
  • Unsignalized intersections pose a challenge for autonomous vehicles that must decide how to navigate them safely and efficiently. This paper proposes a reinforcement learning (RL) method for autonomous vehicles to navigate unsignalized intersections safely and efficiently. The method uses a semantic scene representation to handle variable numbers of vehicles and a universal reward function to facilitate stable learning. A collision risk function is designed to penalize unsafe actions and guide the agent to avoid them. A scalable policy optimization algorithm is introduced to improve data efficiency and safety for vehicle learning at intersections. The algorithm employs experience replay to overcome the on-policy limitation of proximal policy optimization and incorporates the collision risk constraint into the policy optimization problem. The proposed safe RL algorithm can balance the trade-off between vehicle traffic safety and policy learning efficiency. Simulated intersection scenarios with different traffic situations are used to test the algorithm and demonstrate its high success rates and low collision rates under different traffic conditions. The algorithm shows the potential of RL for enhancing the safety and reliability of autonomous driving systems at unsignalized intersections.

     

  • loading
  • [1]
    W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision-making for autonomous vehicles,” Annu. Rev. Control Rob. Auton. Syst., vol. 1, no. 1, pp. 187–210, 2018. doi: 10.1146/annurev-control-060117-105157
    [2]
    S. Mariani, G. Cabri, and F. Zambonelli, “Coordination of autonomous vehicles,” ACM Comput. Surv., vol. 54, no. 1, pp. 1–33, Jan. 2022.
    [3]
    K. Dresner and P. Stone, “A multiagent approach to autonomous intersection management,” J. Artif. Intell. Res., vol. 31, pp. 591–656, Mar. 2008. doi: 10.1613/jair.2502
    [4]
    X. Chen, M. Hu, B. Xu, Y. Bian, and H. Qin, “Improved reservation-based method with controllable gap strategy for vehicle coordination at non-signalized intersections,” Physica A, vol. 604, p. 127953, Oct. 2022. doi: 10.1016/j.physa.2022.127953
    [5]
    R. Tian, N. Li, I. Kolmanovsky, Y. Yildiz, and A. R. Girard, “Game-theoretic modeling of traffic in unsignalized intersection network for autonomous vehicle control verification and validation,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2211–2226, Mar. 2022. doi: 10.1109/TITS.2020.3035363
    [6]
    N. Li, Y. Yao, I. Kolmanovsky, E. Atkins, and A. R. Girard, “Game-theoretic modeling of multi-vehicle interactions at uncontrolled intersections,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 2, pp. 1428–1442, Feb. 2022. doi: 10.1109/TITS.2020.3026160
    [7]
    S. Yan, T. Welschehold, D. Buscher, and W. Burgard, “Courteous behavior of automated vehicles at unsignalized intersections via reinforcement learning,” IEEE Robot. Autom. Lett., vol. 7, no. 1, pp. 191–198, Oct. 2021.
    [8]
    N. Parvez Farazi, B. Zou, T. Ahamed, and L. Barua, “Deep reinforcement learning in transportation research: A review,” Transp. Res. Interdiscip. Persp., vol. 11, p. 100425, Sept. 2021.
    [9]
    F. Azadi, N. Mitrovic, and A. Stevanovic, “Impact of shared lanes on performance of the combined flexible lane assignment and reservation-based intersection control,” Transp. Res. Rec., vol. 2676, no. 12, pp. 51–68, Dec. 2021.
    [10]
    Z. Guo, D. Sun, and L. Zhou, “Game algorithm of intelligent driving vehicle based on left-turn scene of crossroad traffic flow,” Comput. Intell. Neurosci., vol. 2022, p. e9318475, Sept. 2022.
    [11]
    P. Hang, C. Lv, and X. Chen, “Human-like decision making for autonomous vehicles with noncooperative game theoretic method,” Human-Like Decision Making and Control for Autonomous Driving. CRC Press, 2022.
    [12]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
    [13]
    D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, Jan. 2019. doi: 10.1109/JAS.2018.7511249
    [14]
    Z. Zhu and H. Zhao, “A survey of deep rl and il for autonomous driving policy learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, p. 14, Sept. 2022.
    [15]
    Y. Zhang, B. Gao, L. Guo, H. Guo, and H. Chen, “Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5526–5538, Dec. 2021. doi: 10.1109/TNNLS.2020.3042981
    [16]
    J. Wu, Z. Huang, W. Huang, and C. Lv, “Prioritized experience-based reinforcement learning with human guidance for autonomous driving,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 1, pp. 855–869, Jan. 2024. doi: 10.1109/TNNLS.2022.3177685
    [17]
    Y. Wang, S. Hou, and X. Wang, “Reinforcement learning-based birdview automated vehicle control to avoid crossing traffic,” Comput.-Aided Civ. Infrastruct. Eng., vol. 36, no. 7, pp. 890–901, Jul. 2021. doi: 10.1111/mice.12572
    [18]
    C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5379–5391, Dec. 2021. doi: 10.1109/TNNLS.2021.3109284
    [19]
    H. Shu, T. Liu, X. Mu, and D. Cao, “Driving tasks transfer using deep reinforcement learning for decision-making of autonomous vehicles in unsignalized intersection,” IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 41–52, Jan. 2022. doi: 10.1109/TVT.2021.3121985
    [20]
    Z. Qiao, K. Muelling, J. Dolan, P. Palanisamy, and P. Mudalige, “POMDP and hierarchical options MDP with continuous actions for autonomous driving at intersections,” in Proc. 21st Int. Conf. Intelligent Transportation Systems, Nov. 2018, pp. 2377–2382.
    [21]
    Y. Ren, J. Duan, S. E. Li, Y. Guan, and Q. Sun, “Improving generalization of reinforcement learning with minimax distributional soft actor-critic,” in Proc. IEEE 23rd Int. Conf. Intelligent Transportation Systems. IEEE, 2020, pp. 1–6.
    [22]
    H. Seong, C. Jung, S. Lee, and D. H. Shim, “Learning to drive at unsignalized intersections using attention-based deep reinforcement learning,” in Proc. IEEE Int. Intelligent Transportation Systems Conf.. Indianapolis, USA: Oct. 2021, pp. 559–566.
    [23]
    M. Martinson, A. Skrynnik, and A. I. Panov, “Navigating autonomous vehicle at the road intersection simulator with reinforcement learning,” in Proc. Russian Conf. Artificial Intelligence, ser. Lecture Notes in Computer Science, S. O. Kuznetsov, A. I. Panov, and K. S. Yakovlev, Eds. Cham: Springer, 2020, pp. 71–84.
    [24]
    R. Bautista-Montesano, R. Galluzzi, K. Ruan, Y. Fu, and X. Di, “Autonomous navigation at unsignalized intersections: A coupled reinforcement learning and model predictive control approach,” Transp. Res. Part C Emerg. Technol., vol. 139, p. 103662, Jun. 2022. doi: 10.1016/j.trc.2022.103662
    [25]
    E. Candela, O. Doustaly, L. Parada, F. Feng, Y. Demiris, and P. Angeloudis, “Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning,” Artif. Intell., vol. 320, p. 103923, Jul. 2023. doi: 10.1016/j.artint.2023.103923
    [26]
    M. Selim, A. Alanwar, M. W. El-Kharashi, H. M. Abbas, and K. H. Johansson, “Safe reinforcement learning using data-driven predictive control,” in Proc. 5th Int. Conf. Communications, Signal Processing, and Their Applications, Dec. 2022, pp. 1–6.
    [27]
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347 [cs], Jul. 2017.
    [28]
    L. Zhang, R. Zhang, T. Wu, R. Weng, M. Han, and Y. Zhao, “Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5435–5444, Jul. 2021. doi: 10.1109/TNNLS.2021.3084685
    [29]
    T. De Ceunynck, E. Polders, S. Daniels, E. Hermans, T. Brijs, and G. Wets, “Road safety differences between priority-controlled intersections and right-hand priority intersections: behavioral analysis of vehicle-vehicle interactions,” Transp. Res. Rec., vol. 2365, no. 1, pp. 39–48, Jan. 2013. doi: 10.3141/2365-06
    [30]
    P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Flötteröd, R. Hilbrich, L. Lucken, J. Rummel, P. Wagner, and E. Wiessner, “Microscopic traffic simulation using SUMO,” in Proc. 21st Int. Conf. Intelligent Transportation Systems, Nov. 2018, pp. 2575–2582.
    [31]
    J. Song, Y. Wu, Z. Xu, and X. Lin, “Research on car-following model based on SUMO,” in The Proc. IEEE/Int. Conf. Advanced Infocomm Technology, Nov. 2014, pp. 47–55.
    [32]
    H. Li, G. Yu, B. Zhou, P. Chen, Y. Liao, and D. Li, “Semantic-level maneuver sampling and trajectory planning for on-road autonomous driving in dynamic scenarios,” IEEE Trans. Veh. Technol., vol. 70, no. 2, pp. 1122–1134, Feb. 2021. doi: 10.1109/TVT.2021.3051178
    [33]
    Y. Qin, W. Hua, J. Jin, J. Ge, X. Dai, L. Li, X. Wang, and F.-Y. Wang, “AUTOSIM: Automated urban traffic operation simulation via meta-learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1871–1881, 2023. doi: 10.1109/JAS.2023.123780
    [34]
    R. K. R. Pallavali, “Synchronous Intelligent Intersections for Sustainable Urban Mobility,” Ph.D. dissertation, Universidade do Porto, Portugal, 2023.
    [35]
    S. Das and A. K. Maurya, “Defining time-to-collision thresholds by the type of lead vehicle in non-lane-based traffic environments,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 12, pp. 4972–4982, Dec. 2020. doi: 10.1109/TITS.2019.2946001
    [36]
    J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proc. 34th Int. Conf. Machine Learning. PMLR, Jul. 2017, pp. 22–31.
    [37]
    J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in Proc. 32nd Int. Conf. Machine Learning. PMLR, Jun. 2015, pp. 1889–1897.
    [38]
    T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge, “Projection-based constrained policy optimization,” in Proc.8th Int. Conf. Learning Representations, Apr. 2020.
    [39]
    G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI Gym, Jun. 2016.
    [40]
    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(3)

    Article Metrics

    Article views (133) PDF downloads(28) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return