Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

Hao Wang; Shunguo Fan; Jinhua Song; Yang Gao; Xingguo Chen

Volume 1 Issue 3

Jul. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 11.8, Top 4% (SCI Q1)

CiteScore: 17.6, Top 3% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(3): 257-266

Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao and Xingguo Chen, "Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 257-266, 2014.

Citation:

Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao and Xingguo Chen, "Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 257-266, 2014.

Citation:

PDF( 2665 KB)

Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

1. Department of Computer Science and Technology, Nanjing University, China;
2. School of Computer Science of Technology, School of Software, Nanjing University of Posts and Telecommunications, China

Funds:

This work was supported by National Science Foundation of China (61432008, 61175042, 61321491, 61403208) and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.

Abstract

Abstract

This paper studies the problem of transfer learning in the context of reinforcement learning. We propose a novel transfer learning method that can speed up reinforcement learning with the aid of previously learnt tasks. Before performing extensive learning episodes, our method attempts to analyze the learning task via some exploration in the environment, and then attempts to reuse previous learning experience whenever it is possible and appropriate. In particular, our proposed method consists of four stages: 1) subgoal discovery, 2) option construction, 3) similarity searching, and 4) option reusing. Especially, in order to fulfill the task of identifying similar options, we propose a novel similarity measure between options, which is built upon the intuition that similar options have similar stateaction probabilities. We examine our algorithm using extensive experiments, comparing it with existing methods. The results show that our method outperforms conventional non-transfer reinforcement learning algorithms, as well as existing transfer learning methods, by a wide margin.
- Reinforcement learning,
- transfer learning,
- subtask,
- options,
- source task library

FullText(HTML)

References(20)

References

[1]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998.
[2]	Fernández F, García J, Veloso M. Probabilistic policy reuse for intertask transfer learning. Robotics and Autonomous Systems, 2010, 58(7): 866-871
[3]	Lazaric A, Restelli M, Bonarini A. Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual International Conference on Machine Learning. New York, NY: ACM, 2008. 544-551
[4]	McGovern A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001. 361-368
[5]	Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210
[6]	Taylor M E, Stone P. An introduction to intertask transfer for reinforcement learning. AI Magazine, 2011, 32(1): 15-34
[7]	Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765-1776 (in Chinese)
[8]	Watkins C, Dayan P. Q-learning. Machine Learning, 1992, 8(3): 279-292
[9]	Rummery G A, Niranjan M. On-line Q-learning using Connectionist Systems, Technical Report CUED/F-INFENG/TR 166, Department of Engineering, Cambridge University, Cambridge, UK, 1994.
[10]	Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1): 181-211
[11]	Stolle M, Precup D. Learning options in reinforcement learning. In: Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Alberta, Canada: Springer, 2002. 212-223
[12]	Ö zgurü S, Alicia P W, Andrew G B. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceeding of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 816-823
[13]	Taylor M E, Kuhlmann G, Stone P. Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. Estoril, Portugal: ACM, 2008. 283-290
[14]	Chen F, Gao F, Chen S F, Ma Z D. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC002). Haikou, China: IEEE, 2007. 698-702
[15]	Ravindran B, Barto A G. An Algebraic Approach to Abstraction in Reinforcement Learning [Ph. D. dissertation], University of Massachusetts Amherst, USA, 2004
[16]	Soni V, Singh S. Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 494-499
[17]	Wolfe A P, Barto A G, Andrew G. Decision tree methods for finding reusable MDP homomorphism. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 530-535
[18]	Goel S, Huber M. Subgoal discovery for hierachical reinforcement learning using learned policies. In: Proceedings of the 16th International FLAIRS Conference. California, USA: AAAI, 2003. 346-350
[19]	Ferns N, Panangaden P, Precup D. Metrics for finite Markov decision processes. In: Proceeding of the 20th Conference on Uncertainty in Artificial Intelligence. Banff, Canada: AUAI Press, 2004. 162-169
[20]	Mehta N, Natarajan S, Tadepalli P, Fern A. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 2008, 73(3): 289-312

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (1234) PDF downloads(23)

Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content