An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

Xue Lei; Changyin Sun; Donald Wunsch; Yingjiang Zhou; Yu Fang

doi:10.1109/JAS.2017.7510466

Volume 5 Issue 1

Jan. 2018

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2018 > 5(1): 301-310

Lei Xue, Changyin Sun, Donald Wunsch, Yingjiang Zhou and Fang Yu, "An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301-310, Jan. 2018. doi: 10.1109/JAS.2017.7510466

Citation:

Lei Xue, Changyin Sun, Donald Wunsch, Yingjiang Zhou and Fang Yu, "An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301-310, Jan. 2018. doi: 10.1109/JAS.2017.7510466

Citation:

PDF( 17240 KB)

An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

doi: 10.1109/JAS.2017.7510466

Xue Lei^1
,,
Changyin Sun^{1
,
,},
Donald Wunsch^2
,,
Yingjiang Zhou^3
,,
Yu Fang^4
,

1.
Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, School of Automation, Southeast University, Nanjing 210096, China
2.
Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409, USA
3.
College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
4.
Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 210096, China

Funds:

the National Natural Science Foundation (NNSF) of China 61603196

the National Natural Science Foundation (NNSF) of China 61503079

the National Natural Science Foundation (NNSF) of China 61520106009

the National Natural Science Foundation (NNSF) of China 61533008

the Natural Science Foundation of Jiangsu Province of China BK20150851

China Postdoctoral Science Foundation 2015M581842

Jiangsu Postdoctoral Science Foundation 1601259C

Nanjing University of Posts and Telecommunications Science Foundation (NUPTSF) NY215011

Priority Academic Program Development of Jiangsu Higher Education Institutions, the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education MCCSE2015B02

the Research Innovation Program for College Graduates of Jiangsu Province CXLX1309

More Information

Author Bio:
Lei Xue received the B.S. and M.S. degrees from Heilongjiang University and Southeast University, China, in 2009 and in 2012, respectively. He is currently a Ph.D candidate in School of Automation, Southeast University. His research interests include game theory, multi-agent system, and optimization control.(e-mail: rayxue@seu.edu.cn)

Changyin Sun (M'17) is a Professor in School of Automation, Southeast University, Nanjing, China. He received the Bachelor Degree in College of Mathematics, Sichuan University, China, and the M.S. and Ph.D. degrees in electrical engineering from the Southeast University, Nanjing, China, respectively, in 2001 and 2004. He is the Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, Neural Processing Letters, IEEE/CAA Journal of Automatica Sinica. His research interests include intelligent control, flight control, pattern recognition, optimal theory, etc.(e-mail: cysun@seu.edu.cn)

Donald Wunsch (F'05) is the Mary K. Finley Missouri Distinguished Professor at Missouri University of Science and Technology. He received his B.S. and M.S. degrees from University of New Mexico and University of Washington, USA, respectively. He received his Ph.D. degree from University of Washington (Seattle), USA. His key research contributions are in:clustering/unsupervised learning; adaptive resonance and reinforcement learning architectures, hardware and applications; neurofuzzy regression; traveling salesman problem heuristics; robotic swarms; and bioinformatics. He is an IEEE Fellow and previous INNS President, INNS Fellow and Senior Fellow 2007-2013, NSF CAREER Award winner, and winner of the 2015 INNS Gabor Award. He served as IJCNN General Chair, and on several boards, including the St. Patrick's School Board, IEEE Neural Networks Council, International Neural Networks Society, and the University of Missouri Bioinformatics Consortium, Chaired the Missouri University of Science and Technology Information Technology and Computing Committee as well as the Student Design and Experiential Learning Center Board.(e-mail: dwunsch@mst.edu)

Yingjiang Zhou received the M. S. degree from Hohai University in 2010 and the Ph.D. degree from the Southeast University, China, in 2014. He is currently working at College of Automation, Nanjing University of Posts and Telecommunications. His research interests include consensus of multi-agent, formation control of UAV and nonlinear system control.(e-mail: zhouyj@njupt.edu.cn)

Fang Yu received the B.S. and M.S. degrees from Huzhou Teachers College University and Henan University of Science and Technology, China, in 2006 and in 2009, respectively. She received her Ph.D. degree from Kobe University, Japan, in 2013. She is currently a researcher at Shanghai Maritime University. Her research interests include game theory, supply chain management, optimization control.(e-mail: fangyu1985@hotmail.com)
Corresponding author: Changyin Sun, e-mail:cysun@seu.edu.cn
Received Date: 2016-03-09
Accepted Date: 2016-10-31

Abstract

Abstract

The iterated prisoner's dilemma (IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.
- Complex network,
- prisoner's dilemma,
- reinforcement learning,
- temporal differences learning

FullText(HTML)

References(30)

References

[1]	J. Seiffertt, S. Mulder, R. Dua, and D. C. Wunsch, "Neural networks and Markov models for the iterated prisoner's dilemma, " in Proc. Int. Joint Conf. Neural Networks, Atlanta, GA, USA, 2009, pp. 2860-2866. http://dl.acm.org/citation.cfm?id=1704398
[2]	H. Y. Quek, K. C. Tan, C. K. Goh, and H. A. Abbass, "Evolution and incremental learning in the iterated prisoner's dilemma, " IEEE Trans. Evol. Comput., vol. 13, no. 2, pp. 303-320, Apr. 2009. http://ieeexplore.ieee.org/document/4703197/
[3]	R. Axelrod, The Evolution of Cooperation. New York, USA: Basic, 1984.
[4]	M. A. Nowak, R. M. May, "Evolutionary games and spatial chaos, " Nature, vol. 359, no. 6398, pp. 826-829, Oct. 1992. http://www.jstor.org/servlet/linkout?suffix=rf92&dbid=16&doi=10.1086%2F670192&key=10.1038%2F359826a0
[5]	F. Fu, M. A. Nowak, and C. Hauert, "Invasion and expansion of cooperators in lattice populations: Prisoner's dilemma vs. snowdrift games, " J. Theor. Biol., vol. 266, no. 3, pp. 358-366, Oct. 2010. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927800/?tool=pubmed
[6]	J. Liu, Y. Li, C. Xu, and P. M. Hui, "Evolutionary behavior of generalized zero-determinant strategies in iterated prisoner's dilemma, " Phys. A Stat. Mech. Appl., vol. 430, pp. 81-92, Jul. 2015. http://www.sciencedirect.com/science/article/pii/S0378437115002034
[7]	G. Szabó, G. Fath, "Evolutionary games on graphs, " Phys. Rep., vol. 446, no. 4-6, pp. 97-216, Jul. 2007. http://www.sciencedirect.com/science/article/pii/S0370157307001810
[8]	D. C. Wunsch and S. Mulder, "Evolutionary algorithms, Markov decision processes, adaptive critic designs, and clustering: Commonalities, hybridization and performance, " in Proc. Int. Conf. Intelligent Sensing and Information Processing, Chennai, India, 2004, pp. 477-482. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1287704
[9]	H. Ishibuchi and N. Namikawa, "Evolution of iterated prisoner's dilemma game strategies in structured demes under random pairing in game playing, " IEEE Trans. Evol. Comput., vol. 9, no. 6, pp. 552-561, Dec. 2005. http://ieeexplore.ieee.org/document/1545934/
[10]	H. Ishibuchi, H. Ohyanagi, and Y. Nojima, "Evolution of strategies with different representation schemes in a spatial iterated prisoner's dilemma game, " IEEE Trans. Comput. Intell. AI Games, vol. 3, no. 1, pp. 67-82, Mar. 2011. http://ieeexplore.ieee.org/document/5705567/
[11]	D. Ashlock and E. Y. Kim, "Fingerprinting: Visualization and Automatic analysis of prisoner's dilemma strategies, " IEEE Trans. Evol. Comput., vol. 12, no. 5, pp. 647-659, Oct. 2008. http://ieeexplore.ieee.org/document/4492964/
[12]	D. Ashlock, E. Y. Kim, and W. Ashlock, "Fingerprint analysis of the noisy prisoner's dilemma using a finite-state representation, " IEEE Trans. Comput. Intell. AI Games, vol. 1, no. 2, pp. 154-167, Jun. 2009. http://ieeexplore.ieee.org/document/4804733/
[13]	D. Ashlock and C. Lee, "Agent-case embeddings for the analysis of evolved systems, " IEEE Trans. Evol. Comput., vol. 17, no. 2, pp. 227-240, Apr. 2013. http://ieeexplore.ieee.org/document/6384730/
[14]	J. S. Wu, Y. Q. Hou, L. C. Jiao, and H. J. Li, "Community structure inhibits cooperation in the spatial prisoner's dilemma, " Phys. A Stat. Mech. Appl., vol. 412, pp. 169-179, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0378437114005172
[15]	Y. Z. Cui and X. Y. Wang, "Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks, " Phys. A Stat. Mech. Appl., vol. 407, pp. 7-14, Aug. 2014. http://www.sciencedirect.com/science/article/pii/S037843711400288X
[16]	S. P. Nageshrao, G. A. D. Lopes, D. Jeltsema, and R. Babuška, "Porthamiltonian systems in adaptive and learning control:A survey, " IEEE Trans. Autom. Control, vol. 61, no. 5, pp. 1223-1238, May 2016. doi: 10.1109/TAC.2015.2458491
[17]	C. M. Liu, X. Xu, and D. W. Hu, "Multiobjective reinforcement learning: A comprehensive overview, " IEEE Trans. Syst. Man Cybern. Syst., vol. 45, no. 3, pp. 385-398, Mar. 2015.
[18]	Y. J. Liu, Y. Gao, S. C. Tong, and Y. M. Li, "Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discretetime systems with dead-zone, " IEEE Trans. Fuzzy Syst., vol. 24, no. 1, pp. 16-28, Feb. 2016. http://ieeexplore.ieee.org/document/7072483/
[19]	Y. Gao and Y. J. Liu, "Adaptive fuzzy optimal control using direct heuristic dynamic programming for chaotic discrete-time system, " J. Vibrat. Control, vol. 22, no. 2, pp. 595-603, 2016. doi: 10.1177/1077546314534286
[20]	Y. J. Liu, L. Tang, S. C. Tong, C. L. P. Chen, and D. J. Li, "Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, " IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 165-176, Jan. 2015. http://www.ncbi.nlm.nih.gov/pubmed/25438326
[21]	K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality, " Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.
[22]	P. Hingston and G. Kendall, "Learning versus evolution in iterated prisoner's dilemma, " in Proc. Congr. Evolutionary Computation, Portland, OR, USA, 2004, pp. 364-372. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1330880
[23]	S. Y. Chong and X. Yao, "Multiple choices and reputation in multiagent interactions, " IEEE Trans. Evol. Comput., vol. 11, no. 6, pp. 689-711, Dec. 2007. http://ieeexplore.ieee.org/document/4358753/
[24]	E. Semsar-Kazerooni and K. Khorasani, "Multi-agent team cooperation: A game theory approach, " Automatica, vol. 45, no. 10, pp. 2205-2213, Oct. 2009. http://www.sciencedirect.com/science/article/pii/S0005109809002970
[25]	D. Ashlock, J. A. Brown, and P. Hingston, "Multiple opponent optimization of prisoner's dilemma playing agents, " IEEE Trans. Comput. Intell. AI Games, vol. 7, no. 1, pp. 53-65, Mar. 2015. http://ieeexplore.ieee.org/document/6819427/
[26]	J. W. Li and G. Kendall, "The effect of memory size on the evolutionary stability of strategies in iterated prisoner's dilemma, " IEEE Trans. Evol. Comput., vol. 18, no. 6, pp. 819-826, Dec. 2014. http://ieeexplore.ieee.org/document/6642072
[27]	K. Moriyama, "Learning-rate adjusting Q-learning for prisoner's dilemma games, " in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence and Intelligent Agent Technology, Sydney, NSW, Australia, 2008, pp. 322-325. http://ieeexplore.ieee.org/document/4740642/
[28]	X. Y. Deng, Z. P. Zhang, Y. Deng, Q. Liu, and S. H. Chang, "Self-adaptive win-stay-lose-shift reference selection mechanism promotes cooperation on a square lattice, " Appl. Math. Comput., vol. 284, pp. 322-331, Jul. 2016. http://www.sciencedirect.com/science/article/pii/S0096300316302028
[29]	F. C. Santos and J. M. Pacheco, "Scale-free networks provide a unifying framework for the emergence of cooperation, " Phys. Rev. Lett., vol. 95, no. 9, pp. Article ID 098104, Aug. 2005. http://www.ncbi.nlm.nih.gov/pubmed/16197256?dopt=Abstract
[30]	F. C. Santos and J. M. Pacheco, "A new route to the evolution of cooperation, " J. Evol. Biol., vol. 19, no. 3, pp. 726-733, May 2006. doi: 10.1111/jeb.2006.19.issue-3

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (978) PDF downloads(70)

An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

doi: 10.1109/JAS.2017.7510466

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content