IEEE/CAA Journal of Automatica Sinica
Citation:  W. Z. Liu, L. Dong, D. Niu, and C. Y. Sun, “Efficient exploration for multiagent reinforcement learning via transferable successor features,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 9, pp. 1673–1686, Sept. 2022. doi: 10.1109/JAS.2022.105809 
[1] 
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236

[2] 
K. Shao, Z. Tang, Y. Zhu, N. Li, and D. Zhao, “A survey of deep reinforcement learning in video games,” arXiv preprint arXiv: 1912.10944, 2019.

[3] 
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.

[4] 
Y. Ouyang, L. Dong, L. Xue, and C. Sun, “Adaptive control based on neural networks for an uncertain 2DOF helicopter system with input deadzone and output constraints,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 807–815, 2019. doi: 10.1109/JAS.2019.1911495

[5] 
X. Li, L. Dong, and C. Sun, “Databased optimal tracking of autonomous nonlinear switching systems,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227–238, 2020.

[6] 
Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. FeiFei, and A. Farhadi, “Targetdriven visual navigation in indoor scenes using deep reinforcement learning,” in Proc. IEEE Int. Conf. Robotics and Automation, 2017, pp. 3357–3364.

[7] 
H. Li, Q. Zhang, and D. Zhao, “Deep reinforcement learningbased automatic exploration for navigation in unknown environment,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 6, pp. 2064–2076, 2019.

[8] 
M. G. Bellemare, S. Candido, S. Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang, “Autonomous navigation of stratospheric balloons using reinforcement learning,” Nature, vol. 588, no. 7836, pp. 77–82, 2020. doi: 10.1038/s4158602029398

[9] 
T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Trans. Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020. doi: 10.1109/TCYB.2020.2977374

[10] 
L. Buşoniu, R. Babuška, and B. De Schutter, “Multiagent reinforcement learning: An overview,” in Innovations in MultiAgent Systems and Applications1, Berlin Heidelberg, Germany: Springer, 2010, pp. 183–221.

[11] 
P. HernandezLeal, M. Kaisers, T. Baarslag, and E. M. de Cote, “A survey of learning in multiagent environments: Dealing with nonstationarity,” arXiv preprint arXiv: 1707.09183, 2017.

[12] 
G. Papoudakis, F. Christianos, A. Rahman, and S. V. Albrecht, “Dealing with nonstationarity in multiagent deep reinforcement learning,” arXiv preprint arXiv: 1906.04737, 2019.

[13] 
J. Liu, Y. Zhang, Y. Yu, and C. Sun, “Fixedtime leaderfollower consensus of networked nonlinear systems via event/selftriggered control,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 11, pp. 5029–5037, 2020. doi: 10.1109/TNNLS.2019.2957069

[14] 
W. Böhmer, T. Rashid, and S. Whiteson, “Exploration with unreliable intrinsic reward in multiagent reinforcement learning,” arXiv preprint arXiv: 1906.02138, 2019.

[15] 
M. Tan, “Multiagent reinforcement learning: Independent vs. cooperative agents,” in Proc. Int. Conf. Machine Learning, 1993, pp. 330–337.

[16] 
A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS One, vol. 12, no. 4, p. e0172395, 2017.

[17] 
F. A. Oliehoek, M. T. Spaan, and N. Vlassis, “Optimal and approximate qvalue functions for decentralized pomdps,” J. Artificial Intelligence Research, vol. 32, pp. 289–353, 2008. doi: 10.1613/jair.2447

[18] 
L. Kraemer and B. Banerjee, “Multiagent reinforcement learning as a rehearsal for decentralized planning,” Neurocomputing, vol. 190, pp. 82–94, 2016. doi: 10.1016/j.neucom.2016.01.031

[19] 
M. E. Taylor and Stone, “Transfer learning for reinforcement learning domains: A survey,” J. Machine Learning Research, vol. 10, no. 7, pp. 1633–1685, 2009.

[20] 
R. Laroche and M. Barlier, “Transfer reinforcement learning with shared dynamics,” in Proc. AAAI Conf. Artificial Intelligence, 2017, pp. 2147–2153.

[21] 
L. Steccanella, S. Totaro, D. Allonsius, and A. Jonsson, “Hierarchical reinforcement learning for efficient exploration and transfer,” arXiv preprint arXiv: 2011.06335, 2020.

[22] 
T. D. Kulkarni, A. Saeedi, S. Gautam, and S. J. Gershman, “Deep successor reinforcement learning,” arXiv preprint arXiv: 1606.02396, 2016.

[23] 
A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. P. van Hasselt, and D. Silver, “Successor features for transfer in reinforcement learning,” in Proc. Advances in Neural Information Processing Systems, 2017, pp. 4055–4065.

[24] 
A. Barreto, D. Borsa, J. Quan, T. Schaul, D. Silver, M. Hessel, D. Mankowitz, A. Zidek, and R. Munos, “Transfer in deep reinforcement learning using successor features and generalised policy improvement,” in Proc. Int. Conf. Machine Learning, PMLR, 2018, pp. 501–510.

[25] 
A. Barreto, D. Borsa, S. Hou, G. Comanici, E. Aygun, P. Hamel, D. Toyama, S. Mourad, D. Silver, and D. Precup, “The option keyboard: Combining skills in reinforcement learning,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 13052–13062.

[26] 
A. Barreto, S. Hou, D. Borsa, D. Silver, and D. Precup, “Fast reinforcement learning with generalized policy updates,” Proc. National Academy of Sciences, vol. 117, no. 48, pp. 30079–30087, 2020. doi: 10.1073/pnas.1907370117

[27] 
L. Lehnert and M. L. Littman, “Successor features combine elements of modelfree and modelbased reinforcement learning,” J. Machine Learning Research, vol. 21, no. 196, pp. 1–53, 2020.

[28] 
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multiagent actorcritic for mixed cooperativecompetitive environments,” in Proc. Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.

[29] 
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. F. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Valuedecomposition networks for cooperative multiagent learning based on team reward,” in Proc. Int. Conf. Autonomous Agents and MultiAgent Systems, 2018, pp. 2085–2087.

[30] 
T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multiagent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4295–4304.

[31] 
H. Mao, Z. Zhang, Z. Xiao, and Z. Gong, “Modelling the dynamic joint policy of teammates with attention multiagent DDPG,” in Proc. Int. Conf. Autonomous Agents and MultiAgent Systems, 2019, pp. 1108–1116.

[32] 
F. L. da Silva, G. Warnell, A. H. R. Costa, and P. Stone, “Agents teaching agents: A survey on interagent transfer learning,” in Proc. Int. Conf. Autonomous Agents and Multiagent Systems, 2020, pp. 2165–2167.

[33] 
G. Boutsioukis, I. Partalas, and I. Vlahavas, “Transfer learning in multiagent reinforcement learning domains,” in Proc. European Workshop on Reinforcement Learning, Springer, 2011, pp. 249–260.

[34] 
F. L. Da Silva and A. H. R. Costa, “A survey on transfer learning for multiagent reinforcement learning systems,” J. Artificial Intelligence Research, vol. 64, pp. 645–703, 2019. doi: 10.1613/jair.1.11396

[35] 
T. Yang, W. Wang, H. Tang, J. Hao, Z. Meng, H. Mao, D. Li, W. Liu, C. Zhang, Y. Hu, Y. Chen, and C. Fan, “Transfer among agents: An efficient multiagent transfer learning framework,” arXiv preprint arXiv: 2002.08030, 2020.

[36] 
S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian, “Deep decentralized multitask multiagent reinforcement learning under partial observability,” in Proc. Int. Conf. Machine Learning, 2017, pp. 2681–2690.

[37] 
Dayan, “Improving generalization for temporal difference learning: The successor representation,” Neural Computation, vol. 5, no. 4, pp. 613–624, 1993. doi: 10.1162/neco.1993.5.4.613

[38] 
I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman, “The successor representation in human reinforcement learning,” Nature Human Behaviour, vol. 1, no. 9, pp. 680–692, 2017. doi: 10.1038/s4156201701808

[39] 
T. Gupta, A. Mahajan, B. Peng, W. Böhmer, and S. Whiteson, “Uneven: Universal value exploration for multiagent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2021, pp. 3930–3941.

[40] 
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 2018.

[41] 
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actorcritic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Int. Conf. Machine Learning, 2018, pp. 1856–1865.

[42] 
M. L. Littman, “Markov games as a framework for multiagent reinforcement learning,” in Machine Learning Proceedings, San Francisco, USA: Elsevier, 1994, pp. 157–163.

[43] 
J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multiagent policy gradients,” in Proc. AAAI Conf. Artificial Intelligence, 2018, pp. 2974–2982.

[44] 
V. R. Konda and J. N. Tsitsiklis, “Actorcritic algorithms,” in Advances in Neural Information Processing Systems, 2000, pp. 1008–1014.

[45] 
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539

[46] 
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems,Man,and Cybernetics,Part C (Applications and Reviews)

[47] 
Y. Wang, L. Dong, and C. Sun, “Cooperative control for multiplayer pursuitevasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020. doi: 10.1016/j.neucom.2020.06.031

[48] 
T. Madarasz and T. E. J. Behrens, “Better transfer learning with inferred successor maps,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 9026–9037.

[49] 
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. Int. Conf. Machine Learning, 2013, vol. 30.
