 Volume 9
							Issue 3
								
						 Volume 9
							Issue 3 
						IEEE/CAA Journal of Automatica Sinica
| Citation: | M. Mazouchi, S. Nageshrao, and H. Modares, “Conflict-aware safe reinforcement learning: A meta-cognitive learning framework,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 3, pp. 466–481, Mar. 2022. doi: 10.1109/JAS.2021.1004353 | 
 
	                | [1] | R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT press, 2018. | 
| [2] | T. Bian and Z.-P. Jiang, “Reinforcement learning for linear continuoustime systems: An incremental learning approach,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433–440, 2019. doi:  10.1109/JAS.2019.1911390 | 
| [3] | A. Faust, P. Ruymgaart, M. Salman, R. Fierro, and L. Tapia, “Continuous action reinforcement learning for control-affine systems with unknown dynamics,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 323–336, 2014. doi:  10.1109/JAS.2014.7004690 | 
| [4] | G. Chowdhary, M. Liu, R. Grande, T. Walsh, J. How, and L. Carin, “Off-policy reinforcement learning with gaussian processes,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 227–238, 2014. doi:  10.1109/JAS.2014.7004680 | 
| [5] | Z. Wang, Q. Wei, and D. Liu, “Event-triggered adaptive dynamic programming for discrete-time multi-player games,” Inf. Sci., vol. 506, pp. 457–470, 2020. doi:  10.1016/j.ins.2019.05.071 | 
| [6] | Z. Cao, C. Lin, M. Zhou, and R. Huang, “Scheduling semiconductor testing facility by using cuckoo search algorithm with reinforcement learning and surrogate modeling,” IEEE Trans. Automat. Contr. Sci. Eng., vol. 16, no. 2, pp. 825–837, 2018. | 
| [7] | Z. Cao, Q. Xiao, and M. Zhou, “Distributed fusion-based policy search for fast robot locomotion learning,” IEEE Comput. Intell. Mag., vol. 14, no. 3, pp. 19–28, 2019. doi:  10.1109/MCI.2019.2919364 | 
| [8] | T. Liu, B. Tian, Y. Ai, and F.-Y. Wang, “Parallel reinforcement learningbased energy efficiency improvement for a cyber-physical system,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617–626, 2020. doi:  10.1109/JAS.2020.1003072 | 
| [9] | B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, 2017. | 
| [10] | N. Horie, T. Matsui, K. Moriyama, A. Mutoh, and N. Inuzuka, “Multiobjective safe reinforcement learning,” Artificial Life and Robotics, pp. 1–9, 2019. | 
| [11] | H. B. Ammar, E. Eaton, P. Ruvolo, and M. Taylor, “Online multitask learning for policy gradient methods,” in Proc. Int. Conf. Machine Learning. PMLR, 2014, pp. 1206–1214. | 
| [12] | Y. W. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” arXiv preprint arXiv: 1707.04175, 2017. | 
| [13] | Z. Xu and U. Topcu, “Transfer of temporal logic formulas in reinforcement learning,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, vol. 28, 2019, pp. 4010–4018. | 
| [14] | R. Laroche and M. Barlier, “Transfer reinforcement learning with shared dynamics,” in Proc. 31th AAAI Conf. Artificial Intelligence, vol. 31, no. 1, 2017, pp. 2147–2153. | 
| [15] | N. Mehta, S. Natarajan, P. Tadepalli, and A. Fern, “Transfer in variablereward hierarchical reinforcement learning,” Machine Learning, vol. 73, no. 3, pp. 289–312, 2008. doi:  10.1007/s10994-008-5061-y | 
| [16] | M. Mazouchi, Y. Yang, and H. Modares, “Data-driven dynamic multiobjective optimal control: An aspiration-satisfying reinforcement learning approach,” IEEE Trans. Neural Netw. Learn. Syst., 2021. | 
| [17] | W. Ying, Y. Zhang, J. Huang, and Q. Yang, “Transfer learning via learning to transfer,” in Proc. Int. Conf. Machine Learning. PMLR, 2018, pp. 5085–5094. | 
| [18] | M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” J. Mach. Learn. Res., vol. 10, no. 7, pp. 1633–1685, 2009. | 
| [19] | A. Lazaric, “Transfer in reinforcement learning: A framework and a survey,” in Adaptation, Learning, and Optimization. Springer Berlin Heidelberg, 2012, pp. 143–173. | 
| [20] | H. Wang, S. Fan, J. Song, Y. Gao, and X. Chen, “Reinforcement learning transfer based on subgoal discovery and subtask similarity,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 257–266, 2014. doi:  10.1109/JAS.2014.7004683 | 
| [21] | T. Lew, A. Sharma, J. Harrison, and M. Pavone, “Safe learning and control using meta-learning,” in RSS Workshop Robust Autonomy, 2019. | 
| [22] | Z. Yang, K. Merrick, L. Jin, and H. A. Abbass, “Hierarchical deep reinforcement learning for continuous action control,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 11, pp. 5174–5184, 2018. doi:  10.1109/TNNLS.2018.2805379 | 
| [23] | X. Chen, A. Ghadirzadeh, M. Björkman, and P. Jensfelt, “Meta-learning for multi-objective reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Systems, 2019, pp. 977–983. | 
| [24] | S. East, M. Gallieri, J. Masci, J. Koutnık, and M. Cannon, “Infinite-horizon differentiable model predictive control,” arXiv preprint arXiv: 2001.02244, 2020. | 
| [25] | B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” arXiv preprint arXiv: 1810.13400, 2018. | 
| [26] | Z. Li, J. Kiseleva, and M. de Rijke, “Dialogue generation: From imitation learning to inverse reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 6722–6729. | 
| [27] | R. Kamalapurkar, “Linear inverse reinforcement learning in continuous time and space,” in Proc. IEEE Annu. American Control Conf., 2018, pp. 1683–1688. | 
| [28] | J. Rawlings and D. Mayne, Model Predictive Control Theory and Design. Nob Hill Publishing, Madison, WI, 1999. | 
| [29] | E. F. Camacho and C. B. Alba, Model Predictive Control. Springer Science & Business Media, 2013. | 
| [30] | J. Rafati and D. C. Noelle, “Learning representations in model-free hierarchical reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 10009–10010. | 
| [31] | Z. Hou, J. Fei, Y. Deng, and J. Xu, “Data-efficient hierarchical reinforcement learning for robotic assembly control applications,” IEEE Trans. Ind. Electron., vol. 68, no. 11, pp. 11565–11575, 2021. | 
| [32] | Y. Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,” Int. J. Robust Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, 2020. doi:  10.1002/rnc.4962 | 
| [33] | R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 3387–3395. | 
| [34] | X. Li and C. Belta, “Temporal logic guided safe reinforcement learning using control barrier functions,” arXiv preprint arXiv: 1903.09885, 2019. | 
| [35] | Y. Yang, Y. Yin, W. He, K. G. Vamvoudakis, H. Modares, and D. C. Wunsch, “Safety-aware reinforcement learning framework with an actorcritic-barrier structure,” in Proc. IEEE Annu. American Control Conf., 2019, pp. 2352–2358. | 
| [36] | Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 12, pp. 5441–5455, 2020. doi:  10.1109/TNNLS.2020.2967871 | 
| [37] | Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” Int. J. Robust Nonlinear Control, pp. 1923–1940, 1923. | 
| [38] | T. L. Vu, S. Mukherjee, R. Huang, and Q. Hung, “Barrier function-based safe reinforcement learning for emergency control of power systems,” arXiv preprint arXiv: 2103.14186, 2021. | 
| [39] | Z. Cai, H. Cao, W. Lu, L. Zhang, and H. Xiong, “Safe multi-agent reinforcement learning through decentralized multiple control barrier functions,” arXiv preprint arXiv: 2103.12553, 2021. | 
| [40] | A. Kanellopoulos, F. Fotiadis, C. Sun, Z. Xu, K. G. Vamvoudakis, U. Topcu, and W. E. Dixon, “Temporal-logic-based intermittent, optimal, and safe continuous-time learning for trajectory tracking,” arXiv preprint arXiv: 2104.02547, 2021. | 
| [41] | M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proc. AAAI Conf. Artificial Intelligence, vol. 32, no. 1, 2018. | 
| [42] | W. Zhang, O. Bastani, and V. Kumar, “MAMPS: Safe multi-agent reinforcement learning via model predictive shielding,” arXiv preprint arXiv: 1910.12639, 2019. | 
| [43] | O. Bastani, “Safe reinforcement learning with nonlinear dynamics via model predictive shielding,” in Proc. IEEE American Control Conf., pp. 3488–3494, 2021. | 
| [44] | N. Jansen, B. Könighofer, S. Junges, A. C. Serban, and R. Bloem, “Safe reinforcement learning via probabilistic shields,” arXiv preprint arXiv: 1807.06096, 2018. | 
| [45] | S. Li and O. Bastani, “Robust model predictive shielding for safe reinforcement learning with stochastic dynamics,” in Proc. IEEE Int. Conf. Robotics and Automation, 2020, pp. 7166–7172. | 
| [46] | D. Grbic and S. Risi, “Safe reinforcement learning through meta-learned instincts,” in Proc. Artificial Life Conf. MIT Press, 2020, pp. 283–291. | 
| [47] | K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” arXiv preprint arXiv: 1812.05506, 2018. | 
| [48] | A. Mustafa, M. Mazouchi, S. Nageshrao, and H. Modares, “Assured learning-enabled autonomy: A metacognitive reinforcement learning framework,” Int. J. Adapt. Control Signal Process. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/acs.3326 | 
| [49] | S. Lyashevskiy, “Constrained optimization and control of nonlinear systems: New results in optimal control,” in Proc. 35th IEEE Conf. Decision and Control, vol. 1, 1996, pp. 541–546. | 
| [50] | H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, 2014. doi:  10.1016/j.automatica.2014.05.011 | 
| [51] | J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Syst. &Control Lett., vol. 54, no. 4, pp. 325–329, 2005. | 
| [52] | J. Berberich and F. Allgöwer, “A trajectory-based framework for data-driven system analysis and control,” in Proc. IEEE European Control Conf., 2020, pp. 1365–1370. | 
| [53] | I. Markovsky, J. C. Willems, P. Rapisarda, and B. L. De Moor, “Algorithms for deterministic balanced subspace identification,” Automatica, vol. 41, no. 5, pp. 755–766, 2005. doi:  10.1016/j.automatica.2004.10.007 | 
| [54] | R. Sepulchre, M. Jankovic, and P. V. Kokotovic, Constructive Nonlinear Control. Springer Science & Business Media, 2012. | 
| [55] | C. Feller and C. Ebenbauer, “Relaxed logarithmic barrier function based model predictive control of linear systems,” IEEE Trans. Automat. Contr., vol. 62, no. 3, pp. 1223–1238, 2016. |