Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System

Xin Chen; Bo Fu; Yong He; Min Wu

Volume 1 Issue 2

Apr. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(2): 127-133

Xin Chen, Bo Fu, Yong He and Min Wu, "Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 2, pp. 127-133, 2014.

Citation:

Xin Chen, Bo Fu, Yong He and Min Wu, "Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 2, pp. 127-133, 2014.

Citation:

PDF( 603 KB)

Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System

Xin Chen^1,2,
Bo Fu³,
Yong He^4,5,
Min Wu^4,5

1. School of Automation, China University of Geosciences, Wuhan 430074, China;
2. School of Information Science and Engineering, Central South University, Changsha 410083, China;
3. School of Information Science and Engineering, Central South University, Changsha 410083, China;
4. School of Automation, China University of Geosciences, Wuhan 430074, China;
5. School of Information Science and Engineering, Central South University, Changsha 410083, China

Funds:

This work was supported by National Natural Science Foundation of China (61074058).

Abstract

Abstract

Dimension-reduced and decentralized learning is always viewed as an efficient way to solve multi-agent cooperative learning in high dimension. However, the dynamic environment brought by the concurrent learning makes the decentralized learning hard to converge and bad in performance. To tackle this problem, a timesharing-tracking framework (TTF), stemming from the idea that alternative learning in microscopic view results in concurrent learning in macroscopic view, is proposed in this paper, in which the joint-state best-response Q-learning (BRQ-learning) serves as the primary algorithm to adapt to the companions' policies. With the properly defined switching principle, TTF makes all agents learn the best responses to others at different joint states. Thus from the view of the whole joint-state space, agents learn the optimal cooperative policy simultaneously. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with less computation and faster speed compared with other two classical learning algorithms.
- Cooperative multi-agent system,
- reinforcement learning,
- immediate individual reward,
- timesharing tracking

FullText(HTML)

References(20)

References

[1]	Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology:a review. Acta Automatica Sinica, 2004, 30(1):86-100(in Chinese)
[2]	Busoniu L, Babuska R, Schutter B D. Decentralized reinforcement learning control of a robotic manipulator. In:Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore:IEEE, 2006. 1347-1352
[3]	Maravall D, De Lope J, Douminguez R. Coordination of communication in robot teams by reinforcement learning. Robotics and Autonomous Systems, 2013, 61(7):661-666
[4]	Gabel T, Riedmiller M. The cooperative driver:multi-agent learning for preventing traffic jams. International Journal of Traffic and Transportation Engineering, 2013, 1(4):67-76
[5]	Tumer K, Agogino A K. Distributed agent-based air traffic flow management. In:Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. Honolulu, Hawaii, USA:ACM, 2007. 330-337
[6]	Tang Hao, Wan Hai-Feng, Han Jiang-Hong, Zhou Lei. Coordinated lookahead control of multiple CSPS system by multi-agent reinforcement learning. Acta Automatica Sinica, 2010, 36(2):330-337(in Chinese)
[7]	Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics-Part C:Applications and Reviews, 2008, 38(2):156-172
[8]	Abdallah S, Lesser V. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 2008, 33:521-549
[9]	Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes:research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5):673-687(in Chinese)
[10]	Fulda N, Ventura D. Predicting and preventing coordination problems in cooperative Q-learning systems. In:Proceedings of the 20th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc, 2007. 780-785
[11]	Chen X, Chen G, Cao W H, Wu M. Cooperative learning with joint state value approximation for multi-agent systems. Journal of Control Theory and Applications, 2013, 11(2):149-155
[12]	Wang Y, de Silva C W. Multi-robot box-pushing:single-agent Qlearning vs. team Q-learning. In:Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing, China:IEEE, 2006. 3694-3699
[13]	Cheng Yu-Hu, Feng Huan-Ting, Wang Xue-Song. Expectationmaximization policy search with parameter-based exploration. Acta Automatica Sinica, 2012, 38(1):38-45(in Chinese)
[14]	Teboul O, Kokkinos I, Simon L, Koutsourakis P, Paragios N. Parsing facades with shape grammars and reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7):1744-1756
[15]	Matignon L, Laurent G J, Fort-Piat N L. Independent reinforcement learners in cooperative Markov games:a survey regarding coordination problems. The Knowledge Engineering Review, 2012, 27:1-31
[16]	Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136(2):215-250
[17]	Kapetanakis S, Kudenko D. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In:Proceedings of the Third International Joint Conference an Autonomous Agents and Multiagent System. New York, USA:IEEE, 2004. 1258-1259
[18]	Matignon L, Laurent G J, Fort-Piat N L. Hysteretic Q-learning:an algorithm for decentralized reinforcement learning in cooperative multiagent teams. In:Proceedings of IEEE/RSJ International Conference on Intelligent Robots and System. San Diego, California, USA:IEEE, 2007. 64-69
[19]	Tsitsiklis J N. On the convergence of optimistic policy iteration. The Journal of Machine Learning Research, 2003, 3:59-72
[20]	Wang Y, de Silva C W. A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence, 2008, 21(3):470-484

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (3477) PDF downloads(21)

Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content