Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework

Majid Mazouchi; Subramanya Nageshrao; Hamidreza Modares

doi:10.1109/JAS.2021.1004353

Volume 9 Issue 3

Mar. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(3): 466-481

M. Mazouchi, S. Nageshrao, and H. Modares, “Conflict-aware safe reinforcement learning: A meta-cognitive learning framework,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 3, pp. 466–481, Mar. 2022. doi: 10.1109/JAS.2021.1004353

Citation:

M. Mazouchi, S. Nageshrao, and H. Modares, “Conflict-aware safe reinforcement learning: A meta-cognitive learning framework,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 3, pp. 466–481, Mar. 2022. doi: 10.1109/JAS.2021.1004353

Citation:

PDF( 1780 KB)

Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework

doi: 10.1109/JAS.2021.1004353

1.
Michigan State University, East Lansing, MI 48824 USA
2.
Ford Research and Innovation Center, Ford Motor Company, Palo Alto, CA 94304 USA

Funds: This work was supported by Ford Motor Company-Michigan State University Alliance

More Information

Author Bio:
Majid Mazouchi received the B.Sc. degree from K. N. Toosi University of Technology, Iran, in 2007, and the M.Sc. and Ph.D. degrees from Ferdowsi University of Mashhad, Iran, in 2010 and 2018, respectively. He was a Senior Lecturer at Semnan University from 2017 to 2018. He is currently a Researcher in Mechanical Engineering Department, Michigan State University, USA. His current research interests include multi-agent systems, safe reinforcement learning, cyber-physical systems, and distributed control. Dr. Mazouchi is a Guest Associate Editor of the Frontiers in Control Engineering: Adaptive, Robust and Fault Tolerant Control, and he also serves as a Reviewer of several international journals and conferences, including IEEE Transactions on Automatic Control, IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cybernetics, IEEE Control Systems Letters, and IEEE/CAA Journal of Automatica Sinica

Subramanya P. Nageshrao received the B.E. degree in electronics and communication from Visvesvaraya Technological University, India, in 2005, M.S. degree in mechatronics from Technische Universitat Hamburg-Harburg, Germany, in 2011, and the Ph.D. degree from Delft University of Technology, The Netherlands, in 2016. From 2005 to 2009, he was a Software Engineer with Bosch Ltd., India. He is currently a Research Scientist with the Ford Research and Innovation Center, Ford Motor Company, USA. He is actively combining reinforcement-learning techniques with passivity-based control with major focus on autonomous driving. His current research interests include machine learning and nonlinear control

Hamidreza Modares (Senior Member, IEEE) received the B.S. degree from the University of Tehran, Iran, in 2004, the M.S. degree from the Shahrood University of Technology, Iran, in 2006, and the Ph.D. degree from the University of Texas at Arlington, USA, in 2015. He was a Senior Lecturer with the Shahrood University of Technology, from 2006 to 2009 and a Faculty Research Associate with the University of Texas at Arlington, USA from 2015 to 2016. He is currently an Assistant Professor in Mechanical Engineering Department, Michigan State University, USA. His current research interests include cyber-physical systems, reinforcement learning, distributed control, robotics, and machine learning. Dr. Modares was a Recipient of the Best Paper Award from the 2015 IEEE International Symposium on Resilient Control Systems. He is also an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems.
Corresponding author: Hamidreza Modares, e-mail: modaresh@msu.edu
Received Date: 2021-06-26
Revised Date: 2021-08-18
Accepted Date: 2021-09-30

Available Online: 2021-11-02

Abstract

Abstract

In this paper, a data-driven conflict-aware safe reinforcement learning (CAS-RL) algorithm is presented for control of autonomous systems. Existing safe RL results with pre-defined performance functions and safe sets can only provide safety and performance guarantees for a single environment or circumstance. By contrast, the presented CAS-RL algorithm provides safety and performance guarantees across a variety of circumstances that the system might encounter. This is achieved by utilizing a bilevel learning control architecture: A higher meta-cognitive layer leverages a data-driven receding-horizon attentional controller (RHAC) to adapt relative attention to different system’s safety and performance requirements, and, a lower-layer RL controller designs control actuation signals for the system. The presented RHAC makes its meta decisions based on the reaction curve of the lower-layer RL controller using a meta-model or knowledge. More specifically, it leverages a prediction meta-model (PMM) which spans the space of all future meta trajectories using a given finite number of past meta trajectories. RHAC will adapt the system’s aspiration towards performance metrics (e.g., performance weights) as well as safety boundaries to resolve conflicts that arise as mission scenarios develop. This will guarantee safety and feasibility (i.e., performance boundness) of the lower-layer RL-based control solution. It is shown that the interplay between the RHAC and the lower-layer RL controller is a bilevel optimization problem for which the leader (RHAC) operates at a lower rate than the follower (RL-based controller) and its solution guarantees feasibility and safety of the control solution. The effectiveness of the proposed framework is verified through a simulation example.
- Optimal control,
- receding-horizon attentional controller (RHAC),
- reinforcement learning (RL)

FullText(HTML)

References(55)

References

[1]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT press, 2018.
[2]	T. Bian and Z.-P. Jiang, “Reinforcement learning for linear continuoustime systems: An incremental learning approach,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 433–440, 2019. doi: 10.1109/JAS.2019.1911390
[3]	A. Faust, P. Ruymgaart, M. Salman, R. Fierro, and L. Tapia, “Continuous action reinforcement learning for control-affine systems with unknown dynamics,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 323–336, 2014. doi: 10.1109/JAS.2014.7004690
[4]	G. Chowdhary, M. Liu, R. Grande, T. Walsh, J. How, and L. Carin, “Off-policy reinforcement learning with gaussian processes,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 227–238, 2014. doi: 10.1109/JAS.2014.7004680
[5]	Z. Wang, Q. Wei, and D. Liu, “Event-triggered adaptive dynamic programming for discrete-time multi-player games,” Inf. Sci., vol. 506, pp. 457–470, 2020. doi: 10.1016/j.ins.2019.05.071
[6]	Z. Cao, C. Lin, M. Zhou, and R. Huang, “Scheduling semiconductor testing facility by using cuckoo search algorithm with reinforcement learning and surrogate modeling,” IEEE Trans. Automat. Contr. Sci. Eng., vol. 16, no. 2, pp. 825–837, 2018.
[7]	Z. Cao, Q. Xiao, and M. Zhou, “Distributed fusion-based policy search for fast robot locomotion learning,” IEEE Comput. Intell. Mag., vol. 14, no. 3, pp. 19–28, 2019. doi: 10.1109/MCI.2019.2919364
[8]	T. Liu, B. Tian, Y. Ai, and F.-Y. Wang, “Parallel reinforcement learningbased energy efficiency improvement for a cyber-physical system,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617–626, 2020. doi: 10.1109/JAS.2020.1003072
[9]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, 2017.
[10]	N. Horie, T. Matsui, K. Moriyama, A. Mutoh, and N. Inuzuka, “Multiobjective safe reinforcement learning,” Artificial Life and Robotics, pp. 1–9, 2019.
[11]	H. B. Ammar, E. Eaton, P. Ruvolo, and M. Taylor, “Online multitask learning for policy gradient methods,” in Proc. Int. Conf. Machine Learning. PMLR, 2014, pp. 1206–1214.
[12]	Y. W. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” arXiv preprint arXiv: 1707.04175, 2017.
[13]	Z. Xu and U. Topcu, “Transfer of temporal logic formulas in reinforcement learning,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, vol. 28, 2019, pp. 4010–4018.
[14]	R. Laroche and M. Barlier, “Transfer reinforcement learning with shared dynamics,” in Proc. 31th AAAI Conf. Artificial Intelligence, vol. 31, no. 1, 2017, pp. 2147–2153.
[15]	N. Mehta, S. Natarajan, P. Tadepalli, and A. Fern, “Transfer in variablereward hierarchical reinforcement learning,” Machine Learning, vol. 73, no. 3, pp. 289–312, 2008. doi: 10.1007/s10994-008-5061-y
[16]	M. Mazouchi, Y. Yang, and H. Modares, “Data-driven dynamic multiobjective optimal control: An aspiration-satisfying reinforcement learning approach,” IEEE Trans. Neural Netw. Learn. Syst., 2021.
[17]	W. Ying, Y. Zhang, J. Huang, and Q. Yang, “Transfer learning via learning to transfer,” in Proc. Int. Conf. Machine Learning. PMLR, 2018, pp. 5085–5094.
[18]	M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” J. Mach. Learn. Res., vol. 10, no. 7, pp. 1633–1685, 2009.
[19]	A. Lazaric, “Transfer in reinforcement learning: A framework and a survey,” in Adaptation, Learning, and Optimization. Springer Berlin Heidelberg, 2012, pp. 143–173.
[20]	H. Wang, S. Fan, J. Song, Y. Gao, and X. Chen, “Reinforcement learning transfer based on subgoal discovery and subtask similarity,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 257–266, 2014. doi: 10.1109/JAS.2014.7004683
[21]	T. Lew, A. Sharma, J. Harrison, and M. Pavone, “Safe learning and control using meta-learning,” in RSS Workshop Robust Autonomy, 2019.
[22]	Z. Yang, K. Merrick, L. Jin, and H. A. Abbass, “Hierarchical deep reinforcement learning for continuous action control,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 11, pp. 5174–5184, 2018. doi: 10.1109/TNNLS.2018.2805379
[23]	X. Chen, A. Ghadirzadeh, M. Björkman, and P. Jensfelt, “Meta-learning for multi-objective reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots and Systems, 2019, pp. 977–983.
[24]	S. East, M. Gallieri, J. Masci, J. Koutnık, and M. Cannon, “Infinite-horizon differentiable model predictive control,” arXiv preprint arXiv: 2001.02244, 2020.
[25]	B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” arXiv preprint arXiv: 1810.13400, 2018.
[26]	Z. Li, J. Kiseleva, and M. de Rijke, “Dialogue generation: From imitation learning to inverse reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 6722–6729.
[27]	R. Kamalapurkar, “Linear inverse reinforcement learning in continuous time and space,” in Proc. IEEE Annu. American Control Conf., 2018, pp. 1683–1688.
[28]	J. Rawlings and D. Mayne, Model Predictive Control Theory and Design. Nob Hill Publishing, Madison, WI, 1999.
[29]	E. F. Camacho and C. B. Alba, Model Predictive Control. Springer Science & Business Media, 2013.
[30]	J. Rafati and D. C. Noelle, “Learning representations in model-free hierarchical reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 10009–10010.
[31]	Z. Hou, J. Fei, Y. Deng, and J. Xu, “Data-efficient hierarchical reinforcement learning for robotic assembly control applications,” IEEE Trans. Ind. Electron., vol. 68, no. 11, pp. 11565–11575, 2021.
[32]	Y. Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,” Int. J. Robust Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, 2020. doi: 10.1002/rnc.4962
[33]	R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 3387–3395.
[34]	X. Li and C. Belta, “Temporal logic guided safe reinforcement learning using control barrier functions,” arXiv preprint arXiv: 1903.09885, 2019.
[35]	Y. Yang, Y. Yin, W. He, K. G. Vamvoudakis, H. Modares, and D. C. Wunsch, “Safety-aware reinforcement learning framework with an actorcritic-barrier structure,” in Proc. IEEE Annu. American Control Conf., 2019, pp. 2352–2358.
[36]	Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 12, pp. 5441–5455, 2020. doi: 10.1109/TNNLS.2020.2967871
[37]	Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” Int. J. Robust Nonlinear Control, pp. 1923–1940, 1923.
[38]	T. L. Vu, S. Mukherjee, R. Huang, and Q. Hung, “Barrier function-based safe reinforcement learning for emergency control of power systems,” arXiv preprint arXiv: 2103.14186, 2021.
[39]	Z. Cai, H. Cao, W. Lu, L. Zhang, and H. Xiong, “Safe multi-agent reinforcement learning through decentralized multiple control barrier functions,” arXiv preprint arXiv: 2103.12553, 2021.
[40]	A. Kanellopoulos, F. Fotiadis, C. Sun, Z. Xu, K. G. Vamvoudakis, U. Topcu, and W. E. Dixon, “Temporal-logic-based intermittent, optimal, and safe continuous-time learning for trajectory tracking,” arXiv preprint arXiv: 2104.02547, 2021.
[41]	M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proc. AAAI Conf. Artificial Intelligence, vol. 32, no. 1, 2018.
[42]	W. Zhang, O. Bastani, and V. Kumar, “MAMPS: Safe multi-agent reinforcement learning via model predictive shielding,” arXiv preprint arXiv: 1910.12639, 2019.
[43]	O. Bastani, “Safe reinforcement learning with nonlinear dynamics via model predictive shielding,” in Proc. IEEE American Control Conf., pp. 3488–3494, 2021.
[44]	N. Jansen, B. Könighofer, S. Junges, A. C. Serban, and R. Bloem, “Safe reinforcement learning via probabilistic shields,” arXiv preprint arXiv: 1807.06096, 2018.
[45]	S. Li and O. Bastani, “Robust model predictive shielding for safe reinforcement learning with stochastic dynamics,” in Proc. IEEE Int. Conf. Robotics and Automation, 2020, pp. 7166–7172.
[46]	D. Grbic and S. Risi, “Safe reinforcement learning through meta-learned instincts,” in Proc. Artificial Life Conf. MIT Press, 2020, pp. 283–291.
[47]	K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” arXiv preprint arXiv: 1812.05506, 2018.
[48]	A. Mustafa, M. Mazouchi, S. Nageshrao, and H. Modares, “Assured learning-enabled autonomy: A metacognitive reinforcement learning framework,” Int. J. Adapt. Control Signal Process. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/acs.3326
[49]	S. Lyashevskiy, “Constrained optimization and control of nonlinear systems: New results in optimal control,” in Proc. 35th IEEE Conf. Decision and Control, vol. 1, 1996, pp. 541–546.
[50]	H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, 2014. doi: 10.1016/j.automatica.2014.05.011
[51]	J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Syst. &Control Lett., vol. 54, no. 4, pp. 325–329, 2005.
[52]	J. Berberich and F. Allgöwer, “A trajectory-based framework for data-driven system analysis and control,” in Proc. IEEE European Control Conf., 2020, pp. 1365–1370.
[53]	I. Markovsky, J. C. Willems, P. Rapisarda, and B. L. De Moor, “Algorithms for deterministic balanced subspace identification,” Automatica, vol. 41, no. 5, pp. 755–766, 2005. doi: 10.1016/j.automatica.2004.10.007
[54]	R. Sepulchre, M. Jankovic, and P. V. Kokotovic, Constructive Nonlinear Control. Springer Science & Business Media, 2012.
[55]	C. Feller and C. Ebenbauer, “Relaxed logarithmic barrier function based model predictive control of linear systems,” IEEE Trans. Automat. Contr., vol. 62, no. 3, pp. 1223–1238, 2016.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (1305) PDF downloads(109)

Highlights

The proposed algorithm provides safety and performance guarantees across a variety of circumstances that the system might encounter
The bilevel learning control architecture is utilized
A higher meta-cognitive layer leverages a data-driven receding-horizon attentional controller to adapt the relative attention to different system’s safety and performance requirements
A lower-layer RL controller designs control actuation signals for the system

Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework

doi: 10.1109/JAS.2021.1004353

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content