Citation:  Y. Li, Y. Zhang, X. Li, and C. Sun, “Regional multiagent cooperative reinforcement learning for citylevel traffic grid signal control,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1–12, Sept. 2024. 
[1] 
K. N. Qureshi and A. H. Abdullah, “A survey on intelligent transportation systems,” MiddleEast J. Scientific Research, vol. 15, no. 5, pp. 629–642, 2013.

[2] 
H. Wei, G. Zheng, V. Gayah, and Z. Li, “A survey on traffic signal control methods,” arXiv preprint arXiv: 1904.08117, 2019.

[3] 
A. J. Miller, “Settings for fixedcycle traffic signals,” J. Operational Research Society, vol. 14, no. 4, pp. 373–386, 1963. doi: 10.1057/jors.1963.61

[4] 
A. Salkham, R. Cunningham, A. Garg, and V. Cahill, “A collaborative reinforcement learning approach to urban traffic control optimization.” in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence and Intelligent Agent Tech., 2008, vol. 2, pp. 560–566.

[5] 
G. F. Newell, “Approximation methods for queues with application to the fixedcycle traffic light,” Siam Review, vol. 7, no. 2, pp. 223–240, 1965. doi: 10.1137/1007038

[6] 
P. Varaiya, “The maxpressure controller for arbitrary networks of signalized intersections,” in Advances in Dynamic Network Modeling in Complex Transportation Systems. New York, USA: Springer, 2013, pp. 27–66.

[7] 
X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li, “MetaLight: Valuebased metareinforcement learning for traffic signal control,” in Proc. AAAI Conf. Artificial Intelligence, 2020, vol. 34, no. 1, pp. 1153–1160.

[8] 
B. Abdulhai, R. Pringle, and G. J. Karakoulas, “Reinforcement learning for true adaptive traffic signal control,” J. Transportation Engineering, vol. 129, no. 3, pp. 278–285, 2003.

[9] 
X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning network for traffic light cycle control,” IEEE Trans. Vehicular Technology, vol. 68, no. 2, pp. 1243–1253, 2019. doi: 10.1109/TVT.2018.2890726

[10] 
L. Li, Y. Lv, and F.Y. Wang, “Traffic signal timing via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254, 2016. doi: 10.1109/JAS.2016.7508798

[11] 
J. Wu and Y. Lou, “Efficient centralized traffic grid signal control based on metareinforcement learning,” IEEE/CAA J. Autom. Sinica, 2023. DOI: 10.1109/JAS.2023.123270

[12] 
L. Prashanth and S. Bhatnagar, “Reinforcement learning with function approximation for traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 12, no. 2, pp. 412–421, 2010.

[13] 
L. N. Alegre, T. Ziemke, and A. L. Bazzan, “Using reinforcement learning to control traffic signals in a realworld scenario: An approach based on linear function approximation,” IEEE Trans. Intelligent Transportation Systems, vol. 23, no. 7, pp. 9126–9135, 2021.

[14] 
Y. Liu, L. Liu, and W.P. Chen, “Intelligent traffic light control using distributed multiagent Q learning,” in Proc. IEEE 20th Int. Conf. Intelligent Transportation Systems, 2017, pp. 1–8.

[15] 
T. Chu, J. Wang, L. Codecà, and Z. Li, “Multiagent deep reinforcement learning for largescale traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 21, no. 3, pp. 1086–1095, 2019.

[16] 
X. Wang, L. Ke, Z. Qiao, and X. Chai, “Largescale traffic signal control using a novel multiagent reinforcement learning,” IEEE Trans. Cybern., vol. 51, no. 1, pp. 174–187, 2020.

[17] 
Z. Li, H. Yu, G. Zhang, S. Dong, and C.Z. Xu, “Networkwide traffic signal control optimization using a multiagent deep reinforcement learning,” Transportation Research Part C: Emerging Technologies, vol. 125, p. 103059, 2021. doi: 10.1016/j.trc.2021.103059

[18] 
T. Chu, S. Qu, and J. Wang, “Largescale traffic grid signal control with regional reinforcement learning,” in Proc. American Control Conf., 2016, pp. 815–820.

[19] 
T. Tan, F. Bao, Y. Deng, A. Jin, Q. Dai, and J. Wang, “Cooperative deep reinforcement learning for largescale traffic grid signal control,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2687–2700, 2019.

[20] 
S. Jiang, Y. Huang, M. Jafari, and M. Jalayer, “A distributed multiagent reinforcement learning with graph decomposition approach for largescale adaptive traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 23, no. 9, pp. 14689–14701, 2023.

[21] 
L. Yan, L. Zhu, K. Song, Z. Yuan, Y. Yan, Y. Tang, and C. Peng, “Graph cooperation deep reinforcement learning for ecological urban traffic signal control,” Applied Intelligence, vol. 53, no. 6, pp. 6248–6265, 2023. doi: 10.1007/s1048902203208w

[22] 
H. Wei, N. Xu, H. Zhang, G. Zheng, X. Zang, C. Chen, W. Zhang, Y. Zhu, K. Xu, and Z. Li, “Colight: Learning networklevel cooperation for traffic signal control,” in Proc. 28th ACM Int. Conf. Information and Knowledge Management, 2019, pp. 1913–1922.

[23] 
L. Wu, M. Wang, D. Wu, and J. Wu, “DynSTGAT: DynAMIC spatialtemporal graph attention network for traffic signal control,” in Proc. 30th ACM Int. Conf. Inform. & Knowledge Management, 2021, pp. 2150–2159.

[24] 
H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li, “PressLight: Learning max pressure control to coordinate traffic signals in arterial network,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2019, pp. 1290–1298.

[25] 
S. Guicheng and W. Yang, “Review on DECPomdp model for MARL algorithms,” in Proc. Smart Communi., Intelligent Algorithms and Interactive Methods; 4th Int. Conf. Wireless Communi. and Appli., 2022, pp. 29–35.

[26] 
M. L. Littman, “Markov games as a framework for multiagent reinforcement learning,” in Machine Learning Proceedings 1994. New Brunswick, USA: Elsevier, 1994, pp. 157–163.

[27] 
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. London, UK: MIT Press, 2018.

[28] 
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multiagent actorcritic for mixed cooperativecompetitive environments,” in Proc. 31st Int. Conf. Neural Inform. Proc. Syst., 2017, vol. 30, pp. 6382–6393.

[29] 
P. Sunehag, G. Lever, A. Gruslys, et al., “Valuedecomposition networks for cooperative multiagent learning,” arXiv preprint arXiv: 1706.05296, 2017.

[30] 
T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multiagent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4295–4304.

[31] 
C. Guestrin, D. Koller, and R. Parr, “Multiagent planning with factored MDPs,” in Proc. 14th Int. Conf. Neural Inform. Processing Syst.: Natural and Synthetic, 2001 vol. 14, pp. 1523–1530.

[32] 
G. Tesauro, “Extending Qlearning to general adaptive multiagent systems,” in Proc. 16th Int. Conf. Neural Inform. Proc. Syst., 2003 vol. 16, pp. 871–878.

[33] 
M. Tan, “Multiagent reinforcement learning: Independent vs. cooperative agents,” in Proc. 10th Int. Conf. Machine Learning, 1993, pp. 330–337.

[34] 
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” Stat, vol. 150, p. 20, 2017.

[35] 
Q. Wang, B. Wu, P. Zhu, P. Li, and Q. Hu, “ECANet: Efficient channel attention for deep convolutional neural networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020.

[36] 
H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, and Z. Li, “CityFlow: A multiagent reinforcement learning environment for large scale city traffic scenario,” in Proc. World Wide Web Conf., 2019, pp. 3620–3624.

[37] 
P. Varaiya, The MaxPressure Controller for Arbitrary Networks of Signalized Intersections. New York, USA: Springer, 2013.

[38] 
T. Nishi, K. Otaki, K. Hayakawa, and T. Yoshimura, “Traffic signal control based on reinforcement learning with graph convolutional neural nets,” in Proc. 21st IEEE Int. Conf. Intelligent Transportation Systems, 2018.

[39] 
P. Zhou, X. Chen, Z. Liu, T. Braud, P. Hui, and J. Kangasharju, “DRLE: Decentralized reinforcement learning at the edge for traffic light control in the IOV,” IEEE Trans. Intelligent Transportation Systems, vol. 22, no. 4, pp. 2262–2273, 2020.
