IEEE/CAA Journal of Automatica Sinica
Citation: | W. Z. Liu, L. Dong, D. Niu, and C. Y. Sun, “Efficient exploration for multi-agent reinforcement learning via transferable successor features,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 9, pp. 1673–1686, Sept. 2022. doi: 10.1109/JAS.2022.105809 |
[1] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
|
[2] |
K. Shao, Z. Tang, Y. Zhu, N. Li, and D. Zhao, “A survey of deep reinforcement learning in video games,” arXiv preprint arXiv: 1912.10944, 2019.
|
[3] |
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.
|
[4] |
Y. Ouyang, L. Dong, L. Xue, and C. Sun, “Adaptive control based on neural networks for an uncertain 2DOF helicopter system with input deadzone and output constraints,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 807–815, 2019. doi: 10.1109/JAS.2019.1911495
|
[5] |
X. Li, L. Dong, and C. Sun, “Data-based optimal tracking of autonomous nonlinear switching systems,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227–238, 2020.
|
[6] |
Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in Proc. IEEE Int. Conf. Robotics and Automation, 2017, pp. 3357–3364.
|
[7] |
H. Li, Q. Zhang, and D. Zhao, “Deep reinforcement learning-based automatic exploration for navigation in unknown environment,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 6, pp. 2064–2076, 2019.
|
[8] |
M. G. Bellemare, S. Candido, S. Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang, “Autonomous navigation of stratospheric balloons using reinforcement learning,” Nature, vol. 588, no. 7836, pp. 77–82, 2020. doi: 10.1038/s41586-020-2939-8
|
[9] |
T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Trans. Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020. doi: 10.1109/TCYB.2020.2977374
|
[10] |
L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-agent reinforcement learning: An overview,” in Innovations in Multi-Agent Systems and Applications-1, Berlin Heidelberg, Germany: Springer, 2010, pp. 183–221.
|
[11] |
P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. de Cote, “A survey of learning in multiagent environments: Dealing with non-stationarity,” arXiv preprint arXiv: 1707.09183, 2017.
|
[12] |
G. Papoudakis, F. Christianos, A. Rahman, and S. V. Albrecht, “Dealing with non-stationarity in multiagent deep reinforcement learning,” arXiv preprint arXiv: 1906.04737, 2019.
|
[13] |
J. Liu, Y. Zhang, Y. Yu, and C. Sun, “Fixed-time leaderfollower consensus of networked nonlinear systems via event/self-triggered control,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 11, pp. 5029–5037, 2020. doi: 10.1109/TNNLS.2019.2957069
|
[14] |
W. Böhmer, T. Rashid, and S. Whiteson, “Exploration with unreliable intrinsic reward in multi-agent reinforcement learning,” arXiv preprint arXiv: 1906.02138, 2019.
|
[15] |
M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proc. Int. Conf. Machine Learning, 1993, pp. 330–337.
|
[16] |
A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS One, vol. 12, no. 4, p. e0172395, 2017.
|
[17] |
F. A. Oliehoek, M. T. Spaan, and N. Vlassis, “Optimal and approximate q-value functions for decentralized pomdps,” J. Artificial Intelligence Research, vol. 32, pp. 289–353, 2008. doi: 10.1613/jair.2447
|
[18] |
L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,” Neurocomputing, vol. 190, pp. 82–94, 2016. doi: 10.1016/j.neucom.2016.01.031
|
[19] |
M. E. Taylor and Stone, “Transfer learning for reinforcement learning domains: A survey,” J. Machine Learning Research, vol. 10, no. 7, pp. 1633–1685, 2009.
|
[20] |
R. Laroche and M. Barlier, “Transfer reinforcement learning with shared dynamics,” in Proc. AAAI Conf. Artificial Intelligence, 2017, pp. 2147–2153.
|
[21] |
L. Steccanella, S. Totaro, D. Allonsius, and A. Jonsson, “Hierarchical reinforcement learning for efficient exploration and transfer,” arXiv preprint arXiv: 2011.06335, 2020.
|
[22] |
T. D. Kulkarni, A. Saeedi, S. Gautam, and S. J. Gershman, “Deep successor reinforcement learning,” arXiv preprint arXiv: 1606.02396, 2016.
|
[23] |
A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. P. van Hasselt, and D. Silver, “Successor features for transfer in reinforcement learning,” in Proc. Advances in Neural Information Processing Systems, 2017, pp. 4055–4065.
|
[24] |
A. Barreto, D. Borsa, J. Quan, T. Schaul, D. Silver, M. Hessel, D. Mankowitz, A. Zidek, and R. Munos, “Transfer in deep reinforcement learning using successor features and generalised policy improvement,” in Proc. Int. Conf. Machine Learning, PMLR, 2018, pp. 501–510.
|
[25] |
A. Barreto, D. Borsa, S. Hou, G. Comanici, E. Aygun, P. Hamel, D. Toyama, S. Mourad, D. Silver, and D. Precup, “The option keyboard: Combining skills in reinforcement learning,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 13052–13062.
|
[26] |
A. Barreto, S. Hou, D. Borsa, D. Silver, and D. Precup, “Fast reinforcement learning with generalized policy updates,” Proc. National Academy of Sciences, vol. 117, no. 48, pp. 30079–30087, 2020. doi: 10.1073/pnas.1907370117
|
[27] |
L. Lehnert and M. L. Littman, “Successor features combine elements of model-free and model-based reinforcement learning,” J. Machine Learning Research, vol. 21, no. 196, pp. 1–53, 2020.
|
[28] |
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.
|
[29] |
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. F. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proc. Int. Conf. Autonomous Agents and Multi-Agent Systems, 2018, pp. 2085–2087.
|
[30] |
T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4295–4304.
|
[31] |
H. Mao, Z. Zhang, Z. Xiao, and Z. Gong, “Modelling the dynamic joint policy of teammates with attention multiagent DDPG,” in Proc. Int. Conf. Autonomous Agents and MultiAgent Systems, 2019, pp. 1108–1116.
|
[32] |
F. L. da Silva, G. Warnell, A. H. R. Costa, and P. Stone, “Agents teaching agents: A survey on inter-agent transfer learning,” in Proc. Int. Conf. Autonomous Agents and Multiagent Systems, 2020, pp. 2165–2167.
|
[33] |
G. Boutsioukis, I. Partalas, and I. Vlahavas, “Transfer learning in multi-agent reinforcement learning domains,” in Proc. European Workshop on Reinforcement Learning, Springer, 2011, pp. 249–260.
|
[34] |
F. L. Da Silva and A. H. R. Costa, “A survey on transfer learning for multiagent reinforcement learning systems,” J. Artificial Intelligence Research, vol. 64, pp. 645–703, 2019. doi: 10.1613/jair.1.11396
|
[35] |
T. Yang, W. Wang, H. Tang, J. Hao, Z. Meng, H. Mao, D. Li, W. Liu, C. Zhang, Y. Hu, Y. Chen, and C. Fan, “Transfer among agents: An efficient multiagent transfer learning framework,” arXiv preprint arXiv: 2002.08030, 2020.
|
[36] |
S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian, “Deep decentralized multi-task multi-agent reinforcement learning under partial observability,” in Proc. Int. Conf. Machine Learning, 2017, pp. 2681–2690.
|
[37] |
Dayan, “Improving generalization for temporal difference learning: The successor representation,” Neural Computation, vol. 5, no. 4, pp. 613–624, 1993. doi: 10.1162/neco.1993.5.4.613
|
[38] |
I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman, “The successor representation in human reinforcement learning,” Nature Human Behaviour, vol. 1, no. 9, pp. 680–692, 2017. doi: 10.1038/s41562-017-0180-8
|
[39] |
T. Gupta, A. Mahajan, B. Peng, W. Böhmer, and S. Whiteson, “Uneven: Universal value exploration for multi-agent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2021, pp. 3930–3941.
|
[40] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 2018.
|
[41] |
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Int. Conf. Machine Learning, 2018, pp. 1856–1865.
|
[42] |
M. L. Littman, “Markov games as a framework for multiagent reinforcement learning,” in Machine Learning Proceedings, San Francisco, USA: Elsevier, 1994, pp. 157–163.
|
[43] |
J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proc. AAAI Conf. Artificial Intelligence, 2018, pp. 2974–2982.
|
[44] |
V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in Neural Information Processing Systems, 2000, pp. 1008–1014.
|
[45] |
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539
|
[46] |
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems,Man,and Cybernetics,Part C (Applications and Reviews)
|
[47] |
Y. Wang, L. Dong, and C. Sun, “Cooperative control for multi-player pursuit-evasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020. doi: 10.1016/j.neucom.2020.06.031
|
[48] |
T. Madarasz and T. E. J. Behrens, “Better transfer learning with inferred successor maps,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 9026–9037.
|
[49] |
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. Int. Conf. Machine Learning, 2013, vol. 30.
|