QuadQ: Quadratic-Based Value Decomposition for Cooperative Policy Optimization in Multi-Agent Reinforcement Learning

Siying Wang; Ruoning Zhang; Yang Zhou; Jinliang Shao; Yuhua Cheng

doi:10.1109/JAS.2025.125666

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > In Press, Accepted Manuscript

S. Wang, R. Zhang, Y. Zhou, J. Shao, and Y. Cheng, “QuadQ: Quadratic-based value decomposition for cooperative policy optimization in multi-agent reinforcement learning,” IEEE/CAA J. Autom. Sinica, early access, 2026. doi: 10.1109/JAS.2025.125666

Citation:

S. Wang, R. Zhang, Y. Zhou, J. Shao, and Y. Cheng, “QuadQ: Quadratic-based value decomposition for cooperative policy optimization in multi-agent reinforcement learning,” IEEE/CAA J. Autom. Sinica, early access, 2026. doi: 10.1109/JAS.2025.125666

Citation:

PDF( 358 KB)

QuadQ: Quadratic-Based Value Decomposition for Cooperative Policy Optimization in Multi-Agent Reinforcement Learning

doi: 10.1109/JAS.2025.125666

More Information

Abstract

FullText(HTML)

References(11)

References

[1]	W. Zhou, D. Chen, J. Yan, Z. Li, H. Yin, and W. Ge, “Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic,” Autonomous Intelligent Systems, vol. 2, no. 1, p. 5, 2022.
[2]	W. Cao, J. Yan, X. Yang, X. Luo, and X. Guan, “Communication-aware formation control of auvs with model uncertainty and fading channel via integral reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 159–176, 2023.
[3]	L. Xia, Q. Li, R. Song, and H. Modares, “Optimal synchronization control of heterogeneous asymmetric input-constrained unknown nonlinear mass via reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 3, pp. 520–532, 2021.
[4]	H. Zhang, Y. Li, Z. Wang, Y. Ding, and H. Yan, “Policy gradient adaptive dynamic programming for model-free multi-objective optimal control,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 4, pp. 1060–1062, 2023.
[5]	P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proc. AAMAS, 2018, pp. 2085–2087.
[6]	T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020.
[7]	K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in proc. Int. Conf. Machine Learning, 2019, pp. 5887–5896.
[8]	J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang, “QPLEX: Duplex dueling multi-agent Q-learning,” in proc. Int. Conf. Learning Representations, 2021, pp. 1–27.
[9]	T. Rashid, G. Farquhar, B. Peng, and S. Whiteson, “Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 10199–10210, 2020.
[10]	H. Li, H. Zhou, Y. Zou, D. Yu, and T. Lan, “ConCAVEQ: Non-monotonic value function factorization via concave representations in deep multi-agent reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, 2024, vol. 38, no. 16, pp. 17461–17468.
[11]	S. Whiteson, M. Samvelyan, T. Rashid, C. De Witt, G. Farquhar, N. Nardelli, T. Rudner, C. Hung, P. Torr, and J. Foerster, “The starcraft multi-agent challenge,” in Proc. Int. Joint Conf. Autonomous Agents and Multiagent Systems, 2019, pp. 2186–2188.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(1) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views (9) PDF downloads(0)

QuadQ: Quadratic-Based Value Decomposition for Cooperative Policy Optimization in Multi-Agent Reinforcement Learning

doi: 10.1109/JAS.2025.125666

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content