IEEE/CAA Journal of Automatica Sinica
Citation:  A. Perrusquía and W. Guo, “Optimal control of nonlinear systems using experience inference humanbehavior learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 90–102, Jan. 2023. doi: 10.1109/JAS.2023.123009 
Safety critical control is often trained in a simulated environment to mitigate risk. Subsequent migration of the biased controller requires further adjustments. In this paper, an experience inference humanbehavior learning is proposed to solve the migration problem of optimal controllers applied to realworld nonlinear systems. The approach is inspired in the complementary properties that exhibits the hippocampus, the neocortex, and the striatum learning systems located in the brain. The hippocampus defines a physics informed reference model of the realworld nonlinear system for experience inference and the neocortex is the adaptive dynamic programming (ADP) or reinforcement learning (RL) algorithm that ensures optimal performance of the reference model. This optimal performance is inferred to the realworld nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model. Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory. Simulation studies are carried out to verify the approach.
[1] 
D. Liberzon, Calculus of Variations and Optimal Control Theory. Princeton university press, 2011.

[2] 
F. L. Lewis, Optimal Control. New York, NY, USA: Wiley, 2012.

[3] 
A. Perrusquía and W. Yu, “Discretetime H_{2} neural control using reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, pp. 1–11, 2020.

[4] 
J.H. Kim and F. Lewis, “Modelfree h_{∞} control design for unknown linear discretetime systems via Qlearining with LMI,” Automatica, vol. 46, pp. 1320–1326, 2010. doi: 10.1016/j.automatica.2010.05.002

[5] 
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control using natural decision methods to design optimal adaptive controllers,” IEEE Conrol Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012.

[6] 
Z.P. Jiang, T. Bian, and W. Gao, “Learningbased control: A tutorial and some recent results,” Foundations and Trends® in Systems and Control, vol. 8, no. 3, 2020.

[7] 
S. Tu and B. Recht, “The gap between modelbased and modelfree methods on the linear quadratic regulator: An asymptotic viewpoint,” in Proc. Conf. Learning Theory. PMLR, 2019, pp. 3036–3083.

[8] 
M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuoustime Qlearning for infinite horizondiscounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015. doi: 10.1109/TCYB.2014.2322116

[9] 
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018. doi: 10.1109/TNNLS.2017.2773458

[10] 
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpor, and M.B. NaghibiSistani, “Reinforcement Qlearning for optimal tracking control of linear discretetime systems with unknown dynamics,” Automatica, vol. 50, pp. 1167–1175, 2014. doi: 10.1016/j.automatica.2014.02.015

[11] 
H. Modares, F. L. Lewis, and M.B. NaghibiSistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partiallyunknown constrainedinput continuoustime systems,” Automatica, vol. 50, no. 1, pp. 193–202, 2014. doi: 10.1016/j.automatica.2013.09.043

[12] 
H. Modares and F. L. Lewis, “Linear quadratic tracking control of partiallyunknown continuoustime systems using reinforcement learning,” IEEE Trans. Automatic Control, vol. 59, no. 11, pp. 3051–3056, 2014. doi: 10.1109/TAC.2014.2317301

[13] 
A. Perrusquía and W. Yu, “Robot position/force control in unknown environment using hybrid reinforcement learning,” Cybernetics and Systems, vol. 51, no. 4, pp. 542–560, 2020. doi: 10.1080/01969722.2020.1758466

[14] 
Q. Xie, B. Luo, and F. Tan, “Discretetime LQR optimal tracking control problems using approximate dynamic programming algorithm with disturbance,” in Proc. IEEE 4th Int. Conf. Intelligent Control and Information Processing, 2013, pp. 716–721.

[15] 
D. Vrabie and F. L. Lewis, “Neural networks approach for continuoustime direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, pp. 237–246, 2009. doi: 10.1016/j.neunet.2009.03.008

[16] 
L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, 2010.

[17] 
R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.

[18] 
A. Perrusquía and W. Yu, “Neural H_{2} control using continuoustime reinforcement learning,” IEEE Trans. Cybernetics, pp. 1–10, 2020.

[19] 
M. Wiering and M. van Otterlo, Reinforcement Learning: StateofArt. Springer, 2012.

[20] 
I. Grondman, L. Buşoniu, G. A. Lopes, and R. Babǔska, “A survey of actorcritic reinforcement learning: Standard and natural policy gradients,” IEEE Trans. Systems,Man,and Cybernetics,PART C, vol. 42, no. 6, pp. 1291–1307, 2012. doi: 10.1109/TSMCC.2012.2218595

[21] 
A. Perrusquía and W. Yu, “Continuoustime reinforcement learning for robust control under worstcase uncertainty,” Int. Journal of Systems Science, vol. 52, no. 4, pp. 770–784, 2021. doi: 10.1080/00207721.2020.1839142

[22] 
A. AlTamimi, F. Lewis, and M. AbuKhalaf, “Discretetime nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. System,Man,and Cybernetics Part B,Cybernetics, vol. 38, no. 4, pp. 943–949, 2008. doi: 10.1109/TSMCB.2008.926614

[23] 
A. Gheibi, A. Ghiasi, S. Ghaemi, and M. Badamchizadeh, “Designing of robust adaptive passivitybased controller based on reinforcement learning for nonlinear portHamiltonian model with disturbance,” Int. Journal of Control, vol. 93, no. 8, pp. 1754–1764, 2020. doi: 10.1080/00207179.2018.1532607

[24] 
M. TomásRodríguez and S. P. Banks, Linear, TimeVarying Approximations to Nonlinear Dynamical Systems: With Applications in Control and Optimization. Springer Science & Business Media, 2010, vol. 400.

[25] 
C. Wang, Y. Li, S. Sam Ge, and T. Heng Lee, “Optimal critic learning for robot control in timevarying environments,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2301–2310, 2015. doi: 10.1109/TNNLS.2014.2378812

[26] 
R. Kamalapurkar, Walters, and W. Dixon, “Modelbased reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016. doi: 10.1016/j.automatica.2015.10.039

[27] 
A. Perrusquía, W. Yu, and A. Soria, “Position/force control of robot manipulators using reinforcement learning,” Industrial Robot: The International Jounral of Robotics Research and Application, vol. 46, no. 2, pp. 267–280, 2019. doi: 10.1108/IR1020180209

[28] 
H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control. London, U.K.: SpringerVerlag, 2013.

[29] 
B. Kiumarsi and F. L. Lewis, “Actorcritic based optimal tracking for partially unknown nonlinear discretetime systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 1, pp. 140–151, 2015. doi: 10.1109/TNNLS.2014.2358227

[30] 
C. Mu, Z. Ni, C. Sun, and H. He, “Airbreathing hypersonic vehicle tracking control based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 584–598, 2016.

[31] 
C. Mu, Z. Ni, C. Sun, and H. He, “Datadriven tracking control with adaptive dynamic programming for a class of continuoustime nonlinear systems,” IEEE Trans. Cybernetics, vol. 47, no. 6, pp. 1460–1470, 2016.

[32] 
B. Pang and Z.P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” arXiv preprint arXiv: 2008.11592, 2020.

[33] 
K. G. Vamvoudakis, “Qlearning for continuoustime linear systems: A modelfree infinite horizon optimal control approach,” Systems &Control Letters, pp. 14–20, 2017.

[34] 
F. L. Lewis, S. Jagannathan, and A. Yeşildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor & Francis, 1999.

[35] 
A. Perrusquía and W. Yu, “Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview,” Neurocomputing, vol. 438, pp. 145–154, 2021. doi: 10.1016/j.neucom.2021.01.096

[36] 
J. Young Lee, J. B. Park, and Y. H. Choi, “Integral reinforcement learning for continuoustime inputaffine nonlinear systems with simultaneous invariant explorations,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 5, 2015.

[37] 
A. Perrusquía, W. Yu, and X. Li, “Multiagent reinforcement learning for redundant robot control in taskspace,” Int. Journal of Machine Learning and Cybernetics, vol. 12, no. 1, pp. 231–241, 2021. doi: 10.1007/s13042020011677

[38] 
V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236

[39] 
A. Perrusquía and W. Yu, “Robust control under worstcase uncertainty for unknown nonlinear systems using modified reinforcement learning,” Int. Journal of Robust and Nonlinear Control, vol. 30, no. 7, pp. 2920–2936, 2020. doi: 10.1002/rnc.4911

[40] 
J. Ramírez, W. Yu, and A. Perrusquía, “Modelfree reinforcement learning from expert demonstrations: A survey,” Artificial Intelligence Review, pp. 1–29, 2021.

[41] 
A. Perrusquía, W. Yu, and X. Li, “Nonlinear control using human behavior learning,” Information Sciences, vol. 569, pp. 358–375, 2021. doi: 10.1016/j.ins.2021.03.043

[42] 
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, 2017.

[43] 
D. Kumaran, D. Hassabis, and J. L. McClelland, “What learning systems do intelligent agents need? Complementary learning systems theory updated” Trends in Cognitive Sciences, vol. 20, no. 7, pp. 512–534, 2016. doi: 10.1016/j.tics.2016.05.004

[44] 
A. Perrusquía, “A complementary learning approach for expertise transference of humanoptimized controllers,” Neural Networks, vol. 145, pp. 33–41, 2021.

[45] 
R. C. O’Reilly, R. Bhattacharyya, M. D. Howard, and N. Ketz, “Complementary learning systems,” Cognitive Science, vol. 38, no. 6, pp. 1229–1248, 2014. doi: 10.1111/j.15516709.2011.01214.x

[46] 
H. F. Ólafsdóttir, D. Bush, and C. Barry, “The role of hippocampal replay in memory and planning,” Current Biology, vol. 28, no. 1, pp. R37–R50, 2018. doi: 10.1016/j.cub.2017.10.073

[47] 
M. G. Mattar and N. D. Daw, “Prioritized memory access explains planning and hippocampal replay,” Nature Neuroscience, vol. 21, no. 11, pp. 1609–1617, 2018. doi: 10.1038/s415930180232z

[48] 
A. Perrusquía, “Humanbehavior learning: A new complementary learning perspective for optimal decision making controllers,” Neurocomputing, 2022.

[49] 
A. VilàBalló, E. MasHerrero, Ripollés, et al., “Unraveling the role of the hippocampus in reversal learning,” Journal of Neuroscience, vol. 37, no. 28, pp. 6686–6697, 2017. doi: 10.1523/JNEUROSCI.321216.2017

[50] 
K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman, “The hippocampus as a predictive map,” Nature Neuroscience, vol. 20, no. 11, pp. 1643–1653, 2017. doi: 10.1038/nn.4650

[51] 
S. Blakeman and D. Mareschal, “A complementary learning systems approach to temporal difference learning,” Neural Networks, vol. 122, pp. 218–230, 2020. doi: 10.1016/j.neunet.2019.10.011

[52] 
W. Schultz, Apicella, E. Scarnati, and T. Ljungberg, “Neuronal activity in monkey ventral striatum related to the expectation of reward,” Journal of Neuroscience, vol. 12, no. 12, pp. 4595–4610, 1992. doi: 10.1523/JNEUROSCI.121204595.1992

[53] 
J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory,” Psychological Review, vol. 102, no. 3, p. 419, 1995.

[54] 
K. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuoustime infinite horizon optimal control problem,” Automatica, vol. 46, pp. 878–888, 2010. doi: 10.1016/j.automatica.2010.02.018

[55] 
T. Cimen, “Survey of statedependent riccati equation in nonlinear optimal feedback control synthesis,” Journal of Guidance,Control,and Dynamics, vol. 35, no. 4, pp. 1025–1047, 2012. doi: 10.2514/1.55821

[56] 
N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive control design for nonlinear systems,” in Proc. IEEE XXIV Int. Conf. Information, Communication and Automation Technologies, 2013, pp. 1–8.

[57] 
N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive stabilization of nonlinear systems with application to cancer treatment,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1296–1301, 2014. doi: 10.3182/201408246ZA1003.02282

[58] 
S. R. Nekoo, “Model reference adaptive statedependent Riccati equation control of nonlinear uncertain systems: Regulation and tracking of freefloating space manipulators,” Aerospace Science and Technology, vol. 84, pp. 348–360, 2019. doi: 10.1016/j.ast.2018.10.005

[59] 
N. T. Nguyen, “Modelreference adaptive control,” in ModelReference Adaptive Control. Springer, 2018, pp. 83–123.

[60] 
J. R. Cloutier, D. T. Stansbery, and M. Sznaier, “On the recoverability of nonlinear state feedback laws by extended linearization control techniques,” in Proc. IEEE American Control Conf. (Cat. No. 99CH36251), vol. 3, 1999, pp. 1515–1519.

[61] 
H. Modares, F. L. Lewis, and Z.P. Jiang, “H _{∞} tracking control of completely unknown continuoustime systems via offpolicy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749

[62] 
C.Y. Lee and J.J. Lee, “Adaptive control for uncertain nonlinear systems based on multiple neural networks,” IEEE Trans. Systems Man and Cybernetics Part B, vol. 34, no. 1, pp. 325–333, 2004. doi: 10.1109/TSMCB.2003.811520

[63] 
D. Luviano and W. Yu, “Continuoustime path planning for multiagents with fuzzy reinforcement learning,” Journal of Intelligent &Fuzzy Systems, vol. 33, pp. 491–501, 2017.

[64] 
W. Yu and A. Perrusquía, “Simplified stable admittance control using endeffector orientations,” Int. Journal of Social Robotics, vol. 12, no. 5, pp. 1061–1073, 2020. doi: 10.1007/s1236901900579y

[65] 
M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control. John Wiley & Sons, 2020.
