A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 1 Issue 3
Jul.  2014

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao and Xingguo Chen, "Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 257-266, 2014.
Citation: Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao and Xingguo Chen, "Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 257-266, 2014.

Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

Funds:

This work was supported by National Science Foundation of China (61432008, 61175042, 61321491, 61403208) and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.

  • This paper studies the problem of transfer learning in the context of reinforcement learning. We propose a novel transfer learning method that can speed up reinforcement learning with the aid of previously learnt tasks. Before performing extensive learning episodes, our method attempts to analyze the learning task via some exploration in the environment, and then attempts to reuse previous learning experience whenever it is possible and appropriate. In particular, our proposed method consists of four stages: 1) subgoal discovery, 2) option construction, 3) similarity searching, and 4) option reusing. Especially, in order to fulfill the task of identifying similar options, we propose a novel similarity measure between options, which is built upon the intuition that similar options have similar stateaction probabilities. We examine our algorithm using extensive experiments, comparing it with existing methods. The results show that our method outperforms conventional non-transfer reinforcement learning algorithms, as well as existing transfer learning methods, by a wide margin.

     

  • loading
  • [1]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998.
    [2]
    Fernández F, García J, Veloso M. Probabilistic policy reuse for intertask transfer learning. Robotics and Autonomous Systems, 2010, 58(7): 866-871
    [3]
    Lazaric A, Restelli M, Bonarini A. Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual International Conference on Machine Learning. New York, NY: ACM, 2008. 544-551
    [4]
    McGovern A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001. 361-368
    [5]
    Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210
    [6]
    Taylor M E, Stone P. An introduction to intertask transfer for reinforcement learning. AI Magazine, 2011, 32(1): 15-34
    [7]
    Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765-1776 (in Chinese)
    [8]
    Watkins C, Dayan P. Q-learning. Machine Learning, 1992, 8(3): 279-292
    [9]
    Rummery G A, Niranjan M. On-line Q-learning using Connectionist Systems, Technical Report CUED/F-INFENG/TR 166, Department of Engineering, Cambridge University, Cambridge, UK, 1994.
    [10]
    Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1): 181-211
    [11]
    Stolle M, Precup D. Learning options in reinforcement learning. In: Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Alberta, Canada: Springer, 2002. 212-223
    [12]
    Ö zgurü S, Alicia P W, Andrew G B. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceeding of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 816-823
    [13]
    Taylor M E, Kuhlmann G, Stone P. Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. Estoril, Portugal: ACM, 2008. 283-290
    [14]
    Chen F, Gao F, Chen S F, Ma Z D. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC002). Haikou, China: IEEE, 2007. 698-702
    [15]
    Ravindran B, Barto A G. An Algebraic Approach to Abstraction in Reinforcement Learning [Ph. D. dissertation], University of Massachusetts Amherst, USA, 2004
    [16]
    Soni V, Singh S. Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 494-499
    [17]
    Wolfe A P, Barto A G, Andrew G. Decision tree methods for finding reusable MDP homomorphism. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 530-535
    [18]
    Goel S, Huber M. Subgoal discovery for hierachical reinforcement learning using learned policies. In: Proceedings of the 16th International FLAIRS Conference. California, USA: AAAI, 2003. 346-350
    [19]
    Ferns N, Panangaden P, Precup D. Metrics for finite Markov decision processes. In: Proceeding of the 20th Conference on Uncertainty in Artificial Intelligence. Banff, Canada: AUAI Press, 2004. 162-169
    [20]
    Mehta N, Natarajan S, Tadepalli P, Fern A. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 2008, 73(3): 289-312

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1234) PDF downloads(23) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return