Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

Hao Wang; Shunguo Fan; Jinhua Song; Yang Gao; Xingguo Chen; Hao Wang; Shunguo Fan; Jinhua Song; Yang Gao; Xingguo Chen

[1]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998.
[2]	Fernández F, García J, Veloso M. Probabilistic policy reuse for intertask transfer learning. Robotics and Autonomous Systems, 2010, 58(7): 866-871
[3]	Lazaric A, Restelli M, Bonarini A. Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual International Conference on Machine Learning. New York, NY: ACM, 2008. 544-551
[4]	McGovern A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001. 361-368
[5]	Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210
[6]	Taylor M E, Stone P. An introduction to intertask transfer for reinforcement learning. AI Magazine, 2011, 32(1): 15-34
[7]	Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765-1776 (in Chinese)
[8]	Watkins C, Dayan P. Q-learning. Machine Learning, 1992, 8(3): 279-292
[9]	Rummery G A, Niranjan M. On-line Q-learning using Connectionist Systems, Technical Report CUED/F-INFENG/TR 166, Department of Engineering, Cambridge University, Cambridge, UK, 1994.
[10]	Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1): 181-211
[11]	Stolle M, Precup D. Learning options in reinforcement learning. In: Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Alberta, Canada: Springer, 2002. 212-223
[12]	Ö zgurü S, Alicia P W, Andrew G B. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceeding of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 816-823
[13]	Taylor M E, Kuhlmann G, Stone P. Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. Estoril, Portugal: ACM, 2008. 283-290
[14]	Chen F, Gao F, Chen S F, Ma Z D. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC002). Haikou, China: IEEE, 2007. 698-702
[15]	Ravindran B, Barto A G. An Algebraic Approach to Abstraction in Reinforcement Learning [Ph. D. dissertation], University of Massachusetts Amherst, USA, 2004
[16]	Soni V, Singh S. Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 494-499
[17]	Wolfe A P, Barto A G, Andrew G. Decision tree methods for finding reusable MDP homomorphism. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 530-535
[18]	Goel S, Huber M. Subgoal discovery for hierachical reinforcement learning using learned policies. In: Proceedings of the 16th International FLAIRS Conference. California, USA: AAAI, 2003. 346-350
[19]	Ferns N, Panangaden P, Precup D. Metrics for finite Markov decision processes. In: Proceeding of the 20th Conference on Uncertainty in Artificial Intelligence. Banff, Canada: AUAI Press, 2004. 162-169
[20]	Mehta N, Natarajan S, Tadepalli P, Fern A. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 2008, 73(3): 289-312