IEEE/CAA Journal of Automatica Sinica
Citation:  C. H. Liu, F. Zhu, Q. Liu, and Y. C. Fu, "Hierarchical Reinforcement Learning With Automatic SubGoal Identification," IEEE/CAA J. Autom. Sinica, vol. 8, no. 10, pp. 16861696, Oct. 2021. doi: 10.1109/JAS.2021.1004141 
[1] 
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT press, 2018.

[2] 
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236

[3] 
H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,” in Proc. 30th AAAI Conf. Artificial Intelligence, 2016, pp. 2094–2100.

[4] 
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in Proc. Advances in Int. Conf. Learning Representations, 2016, pp. 1–21.

[5] 
Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2016, pp. 1995–2003.

[6] 
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semiMDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1–2, pp. 181–211, 1999. doi: 10.1016/S00043702(99)000521

[7] 
T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” Advances in Neural Information Processing Systems, 2016, pp. 3675–3683.

[8] 
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, “Hindsight experience replay, ” in Advances in Neural Information Processing Systems, 2017, pp. 5048–5058.

[9] 
C. Florensa, Y. Duan, and P. Abbeel, “Stochastic neural networks for hierarchical reinforcement learning,” in Proc. Advances in Int. Conf. Learning Representations, 2017, pp. 1–17.

[10] 
H. Le, N. Jiang, A. Agarwal, M. Dudik, Y. Yue, and H. Daumé, “Hierarchical imitation and reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 2923–2932.

[11] 
X. B. Peng, G. Berseth, K. Yin, and M. Van De Panne, “Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning,” ACM Trans. Graphics, vol. 36, no. 4, pp. 1–13, 2017.

[12] 
J. Rafati and D. C. Noelle, “Learning representations in modelfree hierarchical reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, 2019, pp. 10009–10010.

[13] 
M. Imani and U. M. BragaNeto, “Control of gene regulatory networks using bayesian inverse reinforcement learning,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 16, no. 4, pp. 1250–1261, 2019. doi: 10.1109/TCBB.2018.2830357

[14] 
N. Dilokthanakul, C. Kaplanis, N. Pawlowski, and M. Shanahan, “Feature control as intrinsic motivation for hierarchical reinforcement learning,” IEEE Trans. Neural Networks &Learning Systems, vol. 30, no. 11, pp. 3409–3418, 2019.

[15] 
H. Van Seijen, M. Fatemi, J. Romoff, R. Laroche, T. Barnes, and J. Tsang, “Hybrid reward architecture for reinforcement learning,” Advances in Neural Information Processing Systems, 2017, pp. 5392– 5402.

[16] 
J. Yan, H. He, X. Zhong, and Y. Tang, “Qlearningbased vulnerability analysis of smart grid against sequential topology attacks,” IEEE Trans. Information Forensics &Security, vol. 12, no. 1, pp. 200–210, 2017.

[17] 
H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computeraided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Trans. Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016. doi: 10.1109/TMI.2016.2528162

[18] 
B. Hengst, “Hierarchical reinforcement learning,” Encyclopedia of Machine Learning and Data Mining, pp. 611–619, 2017.

[19] 
R. E. Parr and S. Russell, Hierarchical Control and Learning for Markov Decision Processes. University of California, Berkeley Berkeley, CA, 1998.

[20] 
R. Ramesh, M. Tomar, and B. Ravindran, “Successor options: An option discovery framework for reinforcement learning,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, 2019, pp. 3304–3310.

[21] 
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000. doi: 10.1613/jair.639

[22] 
P. Kai, A. Escande, and A. Kheddar, “Singularity resolution in equality and inequality constrained hierarchical taskspace control by adaptive nonlinear leastsquares,” IEEE Robotics &Automation Letters, vol. 3, no. 4, pp. 3630–3637, 2018.

[23] 
D. Abel, D. Arumugam, L. Lehnert, and M. Littman, “State abstractions for lifelong reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 10–19.

[24] 
Y. Fu, Z. Xu, F. Zhu, Q. Liu, and X. Zhou, “Learn to humanlevel control in dynamic environment using incremental batch interrupting temporal abstraction,” Computer Science &Information Systems, vol. 13, no. 2, pp. 561–577, 2016.

[25] 
A. Neitz, G. Parascandolo, S. Bauer, and B. Schölkopf, “Adaptive skip intervals: Temporal abstraction for recurrent dynamical models,” Advances in Neural Information Processing Systems, 2018, pp. 9816– 9826.

[26] 
O. Nachum, S. S. Gu, H. Lee, and S. Levine, “Dataefficient hierarchical reinforcement learning,” Advances in Neural Information Processing Systems, 2018, pp. 3303–3313.

[27] 
J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in Proc. 34th Int. Conf. Machine LearningVolume 70, 2017, pp. 166–175.

[28] 
I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, and P. Abbeel, “Modelbased reinforcement learning via metapolicy optimization,” in Proc. Conf. Robot Learning, 2018, pp. 617–629.

[29] 
C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2017, pp. 240–248.

[30] 
Z. Xu, H. P. van Hasselt, and D. Silver, “Metagradient reinforcement learning,” Advances in Neural Information Processing Systems, 2018, pp. 2396–2407.

[31] 
A. Garivier, P. Ménard, and G. Stoltz, “Explore first, exploit next: The true shape of regret in bandit problems,” Mathematics of Operations Research, vol. 44, no. 2, pp. 377–399, 2018.

[32] 
M. P. Saka, O. Hasancebi, and Z. W. Geem, “Metaheuristics in structural optimization and discussions on harmony search algorithm,” Swarm and Evolutionary Computation, vol. 28, pp. 88–97, 2016. doi: 10.1016/j.swevo.2016.01.005

[33] 
N. Heess, G. Wayne, D. Silver, T. Lillicrap, T. Erez, and Y. Tassa, “Learning continuous control policies by stochastic value gradients,” Advances in Neural Information Processing Systems, 2015, pp. 2944–2952.

[34] 
J. P. O’Doherty, S. W. Lee, and D. McNamee, “The structure of reinforcementlearning mechanisms in the human brain,” Current Opinion in Behavioral Sciences, vol. 1, pp. 94–100, 2015. doi: 10.1016/j.cobeha.2014.10.004

[35] 
A. G. Barto, “Intrinsic motivation and reinforcement learning,” in Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, 2013, pp. 17–47.

[36] 
P.L. Bacon, J. Harb, and D. Precup, “The optioncritic architecture,” in Proc. 31st AAAI Conf. Artificial Intelligence, 2017, pp. 1726– 1734.

[37] 
Z. Zhao, Z. Yan, F. Li, M. Zhao, Z. Li, and S. Yan, “Discriminative sparse flexible manifold embedding with novel graph for robust visual representation and label propagation,” Pattern Recognition, vol. 61, pp. 492–510, 2017. doi: 10.1016/j.patcog.2016.07.042

[38] 
C. Wong, N. Houlsby, Y. Lu, and A. Gesmundo, “Transfer learning with neural automl,” Advances in Neural Information Processing Systems, 2018, pp. 8356–8365.

[39] 
G. D. Ruxton, “The unequal variance ttest is an underused alternative to student’s ttest and the MannWhitney U test,” Behavioral Ecology, vol. 17, no. 4, pp. 688–690, 2006. doi: 10.1093/beheco/ark016

[40] 
J. C. F. De Winter, “Using the student’s ttest with extremely small sample sizes,” Practical Assessment Research &Evaluation, vol. 18, no. 10, pp. 1–12, 2013.
