IEEE/CAA Journal of Automatica Sinica
Citation:  O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, J. Soesanto, H. Chen, K. Velswamy, F. Ibrahim, and B. Huang, “Reinforcement learning in process industries: Review and perspective,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 283–300, Feb. 2024. doi: 10.1109/JAS.2024.124227 
This survey paper provides a review and perspective on intermediate and advanced reinforcement learning (RL) techniques in process industries. It offers a holistic approach by covering all levels of the process control hierarchy. The survey paper presents a comprehensive overview of RL algorithms, including fundamental concepts like Markov decision processes and different approaches to RL, such as valuebased, policybased, and actorcritic methods, while also discussing the relationship between classical control and RL. It further reviews the wideranging applications of RL in process industries, such as soft sensors, lowlevel control, highlevel control, distributed process control, fault detection and fault tolerant control, optimization, planning, scheduling, and supply chain. The survey paper discusses the limitations and advantages, trends and new applications, and opportunities and future prospects for RL in process industries. Moreover, it highlights the need for a holistic approach in complex systems due to the growing importance of digitalization in the process industries.
[1] 
C. Liu, J. Ding, and J. Sun, “Reinforcement learning based decision making of operational indices in process industry under changing environment,” IEEE Trans. Ind. Inf., vol. 17, no. 4, pp. 2727–2736, Apr. 2021. doi: 10.1109/TII.2020.3005207

[2] 
D. P. Bertsekas, Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control. Belmont, USA: Athena Scientific, 2022.

[3] 
T. Kegyes, Z. Süle, and J. Abonyi, “The applicability of reinforcement learning methods in the development of Industry 4.0 applications,” Complexity, vol. 2021, p. 7179374, Nov. 2021.

[4] 
D. E. Seborg, T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Control. 3rd ed. Hoboken, USA: Wiley, 2011.

[5] 
H. R. Chi, A. Radwan, N.F. Huang, and K. F. Tsang, “Guest editorial: Nextgeneration network automation for industrial internetofthings in Industry 5.0,” IEEE Trans. Ind. Inf., vol. 19, no. 2, pp. 2062–2064, Feb. 2023. doi: 10.1109/TII.2022.3216903

[6] 
T. S. Chu, A. B. Culaba, and J. A. C. Jose, “Robotics in the fifth industrial revolution,” in Proc. IEEE 14th Int. Conf. Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, Boracay Island, Philippines, 2022, pp. 1–6.

[7] 
B. Rolf, I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov, “A review on reinforcement learning algorithms and applications in supply chain management,” Int. J. Prod. Res., vol. 61, no. 20, pp. 7151–7179, Nov. 2022.

[8] 
J. Shin, T. A. Badgwell, K.H. Liu, and J. H. Lee, “Reinforcement learning—Overview of recent progress and implications for process control,” Comput. Chem. Eng., vol. 127, pp. 282–294, Aug. 2019. doi: 10.1016/j.compchemeng.2019.05.029

[9] 
P. Kumar and A. S. Hati, “Review on machine learning algorithm based fault detection in induction motors,” Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 1929–1940, May 2021. doi: 10.1007/s1183102009446w

[10] 
Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications of machine learning to machine fault diagnosis: A review and roadmap,” Mech. Syst. Signal Process., vol. 138, p. 106587, Apr. 2020. doi: 10.1016/j.ymssp.2019.106587

[11] 
R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning: Introduction and applications in industrial process control,” Comput. Chem. Eng., vol. 139, p. 106886, Aug. 2020. doi: 10.1016/j.compchemeng.2020.106886

[12] 
H. Yoo, H. E. Byun, D. Han, and J. H. Lee, “Reinforcement learning for batch process control: Review and perspectives,” Annu. Rev. Control, vol. 52, pp. 108–119, Oct. 2021. doi: 10.1016/j.arcontrol.2021.10.006

[13] 
R. R. Negenborn and J. M. Maestre, “Distributed model predictive control: An overview and roadmap of future research opportunities,” IEEE Control Syst. Mag., vol. 34, no. 4, pp. 87–97, Aug. 2014. doi: 10.1109/MCS.2014.2320397

[14] 
M. Ellis, J. Liu, and P. D. Christofides, Economic Model Predictive Control: Theory, Formulations and Chemical Process Applications. Cham, Germany: Springer, 2017.

[15] 
S. Mata, A. Zubizarreta, and C. Pinto, “Robust tubebased model predictive control for lateral path tracking,” IEEE Trans. Intell. Veh., vol. 4, no. 4, pp. 569–577, Dec. 2019. doi: 10.1109/TIV.2019.2938102

[16] 
A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Syst. Mag., vol. 36, no. 6, pp. 30–44, Dec. 2016. doi: 10.1109/MCS.2016.2602087

[17] 
L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learningbased model predictive control: Toward safe learning in control,” Annu. Rev. Control,Robot.,Auton. Syst., vol. 3, pp. 269–296, May 2020. doi: 10.1146/annurevcontrol090419075625

[18] 
T. Zhang, S. Li, and Y. Zheng, “Implementable stability guaranteed Lyapunovbased datadriven model predictive control with evolving Gaussian process,” Ind. Eng. Chem. Res., vol. 61, no. 39, pp. 14681–14690, Sept. 2022. doi: 10.1021/acs.iecr.2c01963

[19] 
J. Berberich, J. Köhler, M. A. Müller, and F. Allgöwer, “Datadriven model predictive control with stability and robustness guarantees,” IEEE Trans. Automat. Control, vol. 66, no. 4, pp. 1702–1717, Apr. 2021. doi: 10.1109/TAC.2020.3000182

[20] 
P. Kumar, J. B. Rawlings, and S. J. Wright, “Industrial, largescale model predictive control with structured neural networks,” Comput. Chem. Eng., vol. 150, p. 107291, Jul. 2021. doi: 10.1016/j.compchemeng.2021.107291

[21] 
Y. M. Ren, M. S. Alhajeri, J. Luo, S. Chen, F. Abdullah, Z. Wu, and P. D. Christofides, “A tutorial review of neural network modeling approaches for model predictive control,” Comput. Chem. Eng., vol. 165, p. 107956, Sept. 2022. doi: 10.1016/j.compchemeng.2022.107956

[22] 
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” in Proc. 19th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2006, pp. 153–160.

[23] 
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.

[24] 
S. Sengupta, S. Basak, P. Saikia, S. Paul, V. Tsalavoutis, F. Atiah, V. Ravi, and A. Peters, “A review of deep learning with special emphasis on architectures, applications and recent trends,” Knowl.Based Syst., vol. 194, p. 105596, Apr. 2020. doi: 10.1016/j.knosys.2020.105596

[25] 
J. Yu and Y. Zhang, “Challenges and opportunities of deep learningbased process fault detection and diagnosis: A review,” Neural Comput. Appl., vol. 35, no. 1, pp. 211–252, Jan. 2023. doi: 10.1007/s00521022080173

[26] 
L. Biggio and I. Kastanis, “Prognostics and health management of industrial assets: Current progress and road ahead,” Front. Artif. Intell., vol. 3, p. 578613, Nov. 2020. doi: 10.3389/frai.2020.578613

[27] 
S. Zhang and T. Qiu, “Semisupervised LSTM ladder autoencoder for chemical process fault diagnosis and localization,” Chem. Eng. Sci., vol. 251, p. 117467, Apr. 2022. doi: 10.1016/j.ces.2022.117467

[28] 
J. Qian, Z. Song, Y. Yao, Z. Zhu, and X. Zhang, “A review on autoencoder based representation learning for fault detection and diagnosis in industrial processes,” Chem. Intell. Lab. Syst., vol. 231, p. 104711, Dec. 2022. doi: 10.1016/j.chemolab.2022.104711

[29] 
Q. Sun and Z. Ge, “A survey on deep learning for datadriven soft sensors,” IEEE Trans. Ind. Inf., vol. 17, no. 9, pp. 5853–5866, Sept. 2021. doi: 10.1109/TII.2021.3053128

[30] 
X. Yuan, B. Huang, Y. Wang, C. Yang, and W. Gui, “Deep learningbased feature representation and its application for soft sensor modeling with variablewise weighted SAE,” IEEE Trans. Ind. Inf., vol. 14, no. 7, pp. 3235–3243, Jul. 2018. doi: 10.1109/TII.2018.2809730

[31] 
X. Yuan, L. Li, Y. A. W. Shardt, Y. Wang, and C. Yang, “Deep learning with spatiotemporal attentionbased LSTM for industrial soft sensor model development,” IEEE Trans. Ind. Electron., vol. 68, no. 5, pp. 4404–4414, May 2021. doi: 10.1109/TIE.2020.2984443

[32] 
F. Amjad, S. K. Varanasi, and B. Huang, “Kalman filterbased convolutional neural network for robust tracking of frothmiddling interface in a primary separation vessel in presence of occlusions,” IEEE Trans. Instrum. Meas., vol. 70, p. 5007308, Feb. 2021.

[33] 
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, USA: MIT Press, 2018.

[34] 
J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “Highdimensional continuous control using generalized advantage estimation,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico, 2016.

[35] 
S. D.C. Shashua and S. Mannor, “Kalman meets Bellman: Improving policy evaluation through value tracking,” arXiv preprint arXiv: 2002.07171, 2020.

[36] 
Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas, “Sample efficient actorcritic with experience replay,” in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.

[37] 
M. Okada and T. Taniguchi, “DreamingV2: Reinforcement learning with discrete world models without reconstruction,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Kyoto, Japan, 2022, pp. 985–991.

[38] 
N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize from human feedback,” in Proc. 34th Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 3008–3021.

[39] 
O. Dogru, K. Velswamy, and B. Huang, “Actorcritic reinforcement learning and application in developing computervisionbased interface tracking,” Engineering, vol. 7, no. 9, pp. 1248–1261, Sept. 2021. doi: 10.1016/j.eng.2021.04.027

[40] 
R. Bellman, “The theory of dynamic programming,” Bull. Amer. Math. Soc., vol. 60, no. 6, pp. 503–515, Nov. 1954. doi: 10.1090/S000299041954098488

[41] 
S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv: 2005.01643, 2020.

[42] 
D. P. Bertsekas, Reinforcement Learning and Optimal Control. Belmont, USA: Athena Scientific, 2019.

[43] 
C. J. C. H. Watkins and P. Dayan, “Qlearning,” Mach. Learn., vol. 8, pp. 3–4, May 1992.

[44] 
Q. Wei, T. Li, and D. Liu, “Learning control for air conditioning systems via human expressions,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7662–7671, Aug. 2021. doi: 10.1109/TIE.2020.3001849

[45] 
A. Kekuda, R. Anirudh, and M. Krishnan, “Reinforcement learning based intelligent traffic signal control using nstep SARSA,” in Proc. Int. Conf. Artificial Intelligence and Smart Systems, Coimbatore, India, 2021, pp. 379–384.

[46] 
B. Ning, F. H. T. Lin, and S. Jaimungal, “Double deep Qlearning for optimal execution,” Appl. Math. Finance, vol. 28, no. 4, pp. 361–380, Oct. 2021. doi: 10.1080/1350486X.2022.2077783

[47] 
P. Casgrain, B. Ning, and S. Jaimungal, “Deep Qlearning for Nash equilibria: NashDQN,” Appl. Math. Finance, vol. 29, no. 1, pp. 62–78, Nov. 2022. doi: 10.1080/1350486X.2022.2136727

[48] 
V. Konda, ActorCritic Algorithms. Cambridge, USA: Massachusetts Institute of Technology, 2000, pp. 1008–1014.

[49] 
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014.

[50] 
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “Endtoend training of deep visuomotor policies,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 1334–1373, Jan. 2016.

[51] 
R. J. Williams, “Simple statistical gradientfollowing algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3–4, pp. 229–256, May 1992. doi: 10.1007/BF00992696

[52] 
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 12th Int. Conf. Neural Information Processing Systems, Denver, USA, 1999, pp. 1057–1063.

[53] 
I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.

[54] 
I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska, “A survey of actorcritic reinforcement learning: Standard and natural policy gradients,” IEEE Trans. Syst.,Man,Cybern.,Part C (Appl. Rev.), vol. 42, no. 6, pp. 1291–1307, Nov. 2012. doi: 10.1109/TSMCC.2012.2218595

[55] 
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actorcritic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018.

[56] 
V. Mnih, A. P. Badia, M. Mirza, T. Harley, A. Graves, T. Lillicrap, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 1928–1937.

[57] 
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico, 2016.

[58] 
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347, 2017.

[59] 
R. Grosse and J. Martens, “A kroneckerfactored approximate fisher matrix for convolution layers,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 573–582.

[60] 
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actorcritic algorithms and applications,” arXiv preprint arXiv: 1812.05905, 2018.

[61] 
S. Fujimoto, H. Van Hoof, and D. Meger, “Addressing function approximation error in actorcritic methods,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018.

[62] 
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236

[63] 
P. Langevin, “Sur la theorie du mouvement brownien,” C. R. Acad. Sci., vol. 146, pp. 530–533, 1908.

[64] 
R. Munos, T. Stepleton, A. Harutyunyan, and M. G. Bellemare, “Safe and efficient offpolicy reinforcement learning,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1054–1062.

[65] 
T. Shuprajhaa, S. K. Sujit, and K. Srinivasan, “Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes,” Appl. Soft Comput., vol. 128, p. 109450, Oct. 2022. doi: 10.1016/j.asoc.2022.109450

[66] 
Y. Wu, E. Mansimov, S. Liao, R. Grosse, and J. Ba, “Scalable trustregion method for deep reinforcement learning using Kroneckerfactored approximation,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5285–5294.

[67] 
V. Taboga, A. Bellahsen, and H. Dagdougui, “An enhanced adaptivity of reinforcement learningbased temperature control in buildings using generalized training,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, no. 2, pp. 255–266, Apr. 2022. doi: 10.1109/TETCI.2021.3066999

[68] 
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust region policy optimization,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1889–1897.

[69] 
G. Campos, N. H. ElFarra, and A. Palazoglu, “Soft actorcritic deep reinforcement learning with hybrid mixedinteger actions for demand responsive scheduling of energy systems,” Ind. Eng. Chem. Res., vol. 61, no. 24, pp. 8443–8461, Apr. 2022. doi: 10.1021/acs.iecr.1c04984

[70] 
X. Yuan, Y. Wang, R. Zhang, Q. Gao, Z. Zhou, R. Zhou, and F. Yin, “Reinforcement learning control of hydraulic servo system based on TD3 algorithm,” Machines, vol. 10, no. 12, p. 1244, Dec. 2022. doi: 10.3390/machines10121244

[71] 
D. Dutta and S. R. Upreti, “A survey and comparative evaluation of actorcritic methods in process control,” Can. J. Chem. Eng., vol. 100, no. 9, pp. 2028–2056, Sept. 2022. doi: 10.1002/cjce.24508

[72] 
J. Xie, O. Dogru, B. Huang, C. Godwaldt, and B. Willms, “Reinforcement learning for soft sensor design through autonomous crossdomain data selection,” Comput. Chem. Eng., vol. 173, p. 108209, May 2023. doi: 10.1016/j.compchemeng.2023.108209

[73] 
E. Skordilis and R. Moghaddass, “A deep reinforcement learning approach for realtime sensordriven decision making and predictive analytics,” Comput. Ind. Eng., vol. 147, p. 106600, Sept. 2020. doi: 10.1016/j.cie.2020.106600

[74] 
H. C. Croll, K. Ikuma, S. K. Ong, and S. Sarkar, “Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward,” Critical Reviews in Environmental Science and Technology, vol. 53, no. 20, pp. 1775–1794, Mar. 2023. doi: 10.1080/10643389.2023.2183699

[75] 
K. Lee, D. Isele, E. A. Theodorou, and S. Bae, “Spatiotemporal costmap inference for MPC via deep inverse reinforcement learning,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 3194–3201, Apr. 2022. doi: 10.1109/LRA.2022.3146635

[76] 
M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Trans. Automat. Control, vol. 66, no. 8, pp. 3638–3652, Aug. 2021. doi: 10.1109/TAC.2020.3024161

[77] 
O. Dogru, N. Wieczorek, K. Velswamy, F. Ibrahim, and B. Huang, “Online reinforcement learning for a continuous space system with experimental validation,” J. Process Control, vol. 104, pp. 86–100, Aug. 2021. doi: 10.1016/j.jprocont.2021.06.004

[78] 
B. Zhao, D. Liu, and C. Luo, “Reinforcement learningbased optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 10, pp. 4330–4340, Oct. 2020. doi: 10.1109/TNNLS.2019.2954983

[79] 
Y. Yifei and S. Lakshminarayanan, “Multiagent reinforcement learning system for multiloop control of chemical processes,” in Proc. IEEE Int. Symp. Advanced Control of Industrial Processes, Vancouver, Canada, 2022, pp. 48–53.

[80] 
L. Chen, F. Meng, and Y. Zhang, “MBRLMC: An HVAC control approach via combining modelbased deep reinforcement learning and model predictive control,” IEEE Internet Things J., vol. 9, no. 19, pp. 19160–19173, Oct. 2022. doi: 10.1109/JIOT.2022.3164023

[81] 
H. Dong, X. Zhao, and H. Yang, “Reinforcement learningbased approximate optimal control for attitude reorientation under state constraints,” IEEE Trans. Control Syst. Technol., vol. 29, no. 4, pp. 1664–1673, Jul. 2021. doi: 10.1109/TCST.2020.3007401

[82] 
D. G. McClement, N. P. Lawrence, J. U. Backström, P. D. Loewen, M. G. Forbes, and R. B. Gopaluni, “Metareinforcement learning for the tuning of PI controllers: An offline approach,” J. Process Control, vol. 118, pp. 139–152, Oct. 2022. doi: 10.1016/j.jprocont.2022.08.002

[83] 
I. Carlucho, M. De Paula, and G. G. Acosta, “An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots,” ISA Trans., vol. 102, pp. 280–294, Jul. 2020. doi: 10.1016/j.isatra.2020.02.017

[84] 
O. Dogru, K. Velswamy, F. Ibrahim, Y. Wu, A. S. Sundaramoorthy, B. Huang, S. Xu, M. Nixon, and N. Bell, “Reinforcement learning approach to autonomous PID tuning,” Comput. Chem. Eng., vol. 161, p. 107760, May 2022. doi: 10.1016/j.compchemeng.2022.107760

[85] 
M. Mehndiratta, E. Camci, and E. Kayacan, “Automated tuning of nonlinear model predictive controller by reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 3016–3021.

[86] 
B. Zarrouki, V. Klöes, N. Heppner, S. Schwan, R. Ritschel, and R. Voßwinkel, “Weightsvarying MPC for autonomous vehicle guidance: A deep reinforcement learning approach,” in Proc. European Control Conf., Delft, Netherlands, 2021, pp. 119–125.

[87] 
E. Bøhn, S. Gros, S. Moe, and T. A. Johansen, “Reinforcement learning of the prediction horizon in model predictive control,” IFACPapersOnLine, vol. 54, no. 6, pp. 314–320, Feb. 2021. doi: 10.1016/j.ifacol.2021.08.563

[88] 
W. H. Ray, Advanced Process Control. New York: McGrawHill, 1981.

[89] 
R. Padhi, S. N. Balakrishnan, and T. Randolph, “Adaptivecritic based optimal neuro control synthesis for distributed parameter systems,” Automatica, vol. 37, no. 8, pp. 1223–1234, Aug. 2001. doi: 10.1016/S00051098(01)000930

[90] 
M. Kumar, K. Rajagopal, S. N. Balakrishnan, and N. T. Nguyen, “Reinforcement learning based controller synthesis for flexible aircraft wings,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 435–448, Oct. 2014. doi: 10.1109/JAS.2014.7004670

[91] 
B. Luo, H.N. Wu, and H.X. Li, “Databased suboptimal neurocontrol design with reinforcement learning for dissipative spatially distributed processes,” Ind. Eng. Chem. Res., vol. 53, no. 19, pp. 8106–8119, Apr. 2014. doi: 10.1021/ie4031743

[92] 
B. Luo and H.N. Wu, “Approximate optimal control design for nonlinear onedimensional parabolic PDE systems using empirical eigenfunctions and neural network,” IEEE Trans. Syst.,Man,Cybern.,Part B (Cybern.), vol. 42, no. 6, pp. 1538–1549, Dec. 2012. doi: 10.1109/TSMCB.2012.2194781

[93] 
B. Luo, T. Huang, H.N. Wu, and X. Yang, “Datadriven H_{∞} control for nonlinear distributed parameter systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, pp. 2949–2961, Nov. 2015. doi: 10.1109/TNNLS.2015.2461023

[94] 
S. Peitz, J. Stenner, V. Chidananda, O. Wallscheid, S. L. Brunton, and K. Taira, “Distributed control of partial differential equations using convolutional reinforcement learning,” arXiv preprint arXiv: 2301.10737, 2023.

[95] 
H. Ren, J. Dai, H. Zhang, and K. Zhang, “Offpolicy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems,” Trans. Inst. Meas. Control, vol. 42, no. 15, pp. 2919–2928, Jul. 2020. doi: 10.1177/0142331220932634

[96] 
H. Yu, S. Park, A. Bayen, S. Moura, and M. Krstic, “Reinforcement learning versus PDE backstepping and PI control for congested freeway traffic,” IEEE Trans. Control Syst. Technol., vol. 30, no. 4, pp. 1595–1611, Jul. 2022. doi: 10.1109/TCST.2021.3116796

[97] 
Z. Wang, H.X. Li, and C. Chen, “Reinforcement learningbased optimal sensor placement for spatiotemporal modeling,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2861–2871, Jun. 2020. doi: 10.1109/TCYB.2019.2901897

[98] 
E. N. Evans, M. A. Pereira, G. I. Boutselis, and E. A. Theodorou, “Variational optimization based reinforcement learning for infinite dimensional stochastic systems,” in Proc. 3rd Annu. Conf. Robot Learning, Osaka, Japan, 2019, pp. 1231–1246.

[99] 
S. Fan, X. Zhang, and Z. Song, “Imbalanced sample selection with deep reinforcement learning for fault diagnosis,” IEEE Trans. Ind. Inf., vol. 18, no. 4, pp. 2518–2527, Apr. 2022. doi: 10.1109/TII.2021.3100284

[100] 
Y. Zhu, X. Liang, T. Wang, J. Xie, and J. Yang, “Multiinformation fusion fault diagnosis of bogie bearing under small samples via unsupervised representation alignment deep Qlearning,” IEEE Trans. Instrum. Meas., vol. 72, p. 3503315, 2023.

[101] 
G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis method based on transfer convolutional neural networks,” IEEE Trans. Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020. doi: 10.1109/TIM.2019.2902003

[102] 
Y. Zhao, T. Li, X. Zhang, and C. Zhang, “Artificial intelligencebased fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future,” Renew. Sustain. Energy Rev., vol. 109, pp. 85–101, Jul. 2019. doi: 10.1016/j.rser.2019.04.021

[103] 
J. Zhou, L. Zheng, Y. Wang, C. Wang, and R. X. Gao, “Automated model generation for machinery fault diagnosis based on reinforcement learning and neural architecture search,” IEEE Trans. Instrum. Meas., vol. 71, p. 3501512, Jan. 2022.

[104] 
F. Lv, C. Wen, and M. Liu, “Representation learning based adaptive multimode process monitoring,” Chemom. Intell. Lab. Syst., vol. 181, pp. 95–104, Oct. 2018. doi: 10.1016/j.chemolab.2018.07.011

[105] 
W. Zhao, H. Liu, and F. L. Lewis, “Datadriven faulttolerant control for attitude synchronization of nonlinear quadrotors,” IEEE Trans. Autom. Control, vol. 66, no. 11, pp. 5584–5591, Nov. 2021. doi: 10.1109/TAC.2021.3053194

[106] 
H. Li, Y. Wu, and M. Chen, “Adaptive faulttolerant tracking control for discretetime multiagent systems via reinforcement learning algorithm,” IEEE Trans. Cybern., vol. 51, no. 3, pp. 1163–1174, Mar. 2020.

[107] 
L. Liu, Z. Wang, and H. Zhang, “Databased adaptive fault estimation and faulttolerant control for MIMO modelfree systems using generalized fuzzy hyperbolic model,” IEEE Trans. Fuzzy Syst., vol. 26, no. 6, pp. 3191–3205, Dec. 2018. doi: 10.1109/TFUZZ.2017.2717801

[108] 
K. Zhao and J. Chen, “Adaptive neural quantized control of MIMO nonlinear systems under actuation faults and timevarying output constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3471–3481, Sept. 2020. doi: 10.1109/TNNLS.2019.2944690

[109] 
Y. Du, B. Jiang, Y. Ma, and Y. Cheng, “Robust ADPbased slidingmode faulttolerant control for nonlinear systems with application to spacecraft,” Appl. Sci., vol. 12, no. 3, p. 1673, Feb. 2022. doi: 10.3390/app12031673

[110] 
I. A. Zamfirache, R.E. Precup, R.C. Roman, and E. M. Petriu, “Reinforcement learningbased control using Qlearning and gravitational search algorithm with experimental validation on a nonlinear servo system,” Inf. Sci., vol. 583, pp. 99–120, Jan. 2022. doi: 10.1016/j.ins.2021.10.070

[111] 
D. P. Zhang and Z. W. Gao, “Reinforcement learningbased faulttolerant control with application to flux cored wire system,” Meas. Control, vol. 51, no. 7–8, pp. 349–359, Jul. 2018. doi: 10.1177/0020294018789202

[112] 
Q. Chen, Y. Jin, and Y. Song, “Faulttolerant adaptive tracking control of EulerLagrange systems—An echo state network approach driven by reinforcement learning,” Neurocomputing, vol. 484, pp. 109–116, May 2022. doi: 10.1016/j.neucom.2021.10.083

[113] 
N. V. Sahinidis, “Mixedinteger nonlinear programming 2018,” Optim. Eng., vol. 20, no. 2, pp. 301–306, Apr. 2019. doi: 10.1007/s11081019094381

[114] 
A. P. BarbosaPóvoa, “Process supply chains management—Where are we? Where to go next?” Front. Energy Res., vol. 2, p. 23, Jun. 2014.

[115] 
K. Alhazmi, F. Albalawi, and S. M. Sarathy, “A reinforcement learningbased economic model predictive control framework for autonomous operation of chemical reactors,” Chem. Eng. J., vol. 428, p. 130993, Jan. 2022. doi: 10.1016/j.cej.2021.130993

[116] 
B. Zhang, W. Hu, D. Cao, Q. Huang, Z. Chen, and F. Blaabjerg, “Deep reinforcement learningbased approach for optimizing energy conversion in integrated electrical and heating system with renewable energy,” Energy Convers. Manage., vol. 202, p. 112199, Dec. 2019. doi: 10.1016/j.enconman.2019.112199

[117] 
D.H. Oh, D. Adams, N. D. Vo, D. Q. Gbadago, C.H. Lee, and M. Oh, “Actorcritic reinforcement learning to estimate the optimal operating conditions of the hydrocracking process,” Comput. Chem. Eng., vol. 149, p. 107280, Jun. 2021. doi: 10.1016/j.compchemeng.2021.107280

[118] 
H. Shafi, K. Velswamy, F. Ibrahim, and B. Huang, “A hierarchical constrained reinforcement learning for optimization of bitumen recovery rate in a primary separation vessel,” Comput. Chem. Eng., vol. 140, p. 106939, Sept. 2020. doi: 10.1016/j.compchemeng.2020.106939

[119] 
T. A. MendiolaRodriguez and L. A. RicardezSandoval, “Robust control for anaerobic digestion systems of tequila vinasses under uncertainty: A deep deterministic policy gradient algorithm,” Digit. Chem. Eng., vol. 3, p. 100023, Jun. 2022. doi: 10.1016/j.dche.2022.100023

[120] 
B. K. M. Powell, D. Machalek, and T. Quah, “Realtime optimization using reinforcement learning,” Comput. Chem. Eng., vol. 143, p. 107077, Dec. 2020. doi: 10.1016/j.compchemeng.2020.107077

[121] 
S. Nikita, A. Tiwari, D. Sonawat, H. Kodamana, and A. S. Rathore, “Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals,” Chem. Eng. Sci., vol. 230, p. 116171, Feb. 2021. doi: 10.1016/j.ces.2020.116171

[122] 
M. Konishi, M. Inubushi, and S. Goto, “Fluid mixing optimization with reinforcement learning,” Sci. Rep., vol. 12, no. 1, p. 14268, Aug. 2022. doi: 10.1038/s41598022180377

[123] 
F. Hourfar, H. J. Bidgoly, B. Moshiri, K. Salahshoor, and A. Elkamel, “A reinforcement learning approach for waterflooding optimization in petroleum reservoirs,” Eng. Appl. Artif. Intell., vol. 77, pp. 98–116, Jan. 2019. doi: 10.1016/j.engappai.2018.09.019

[124] 
A. Tewari, K.H. Liu, and D. Papageorgiou, “Informationtheoretic sensor planning for largescale production surveillance via deep reinforcement learning,” Comput. Chem. Eng., vol. 141, p. 106988, Oct. 2020. doi: 10.1016/j.compchemeng.2020.106988

[125] 
C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, and J. M. Wassick, “A deep reinforcement learning approach for chemical production scheduling,” Comput. Chem. Eng., vol. 141, p. 106982, Oct. 2020. doi: 10.1016/j.compchemeng.2020.106982

[126] 
X. Wang, Y. Qian, H. Gao, C. W. Coley, Y. Mo, R. Barzilay, and K. F. Jensen, “Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning,” Chem. Sci., vol. 11, no. 40, pp. 10959–10972, Sept. 2020. doi: 10.1039/D0SC04184J

[127] 
K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” J. Artif. Intell. Res., vol. 75, pp. 1401–1476, Dec. 2022. doi: 10.1613/jair.1.13673

[128] 
Z. Yuan, A. W. Hall, S. Zhou, L. Brunke, M. Greeff, J. Panerati, and A. P. Schoellig, “Safecontrolgym: A unified benchmark suite for safe learningbased control and reinforcement learning in robotics,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 11142–11149, Oct. 2022. doi: 10.1109/LRA.2022.3196132

[129] 
Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H_{∞} output regulation of linear continuoustime multiagent systems,” Automatica, vol. 148, p. 110768, Feb. 2023. doi: 10.1016/j.automatica.2022.110768

[130] 
F.Y. Wang, Q. Miao, X. Li, X. Wang, and Y. Lin, “What does ChatGPT say: The DAO from algorithmic intelligence to linguistic intelligence,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 575–579, Mar. 2023. doi: 10.1109/JAS.2023.123486

[131] 
M. Tokic and H. B. Ammar, “Teaching reinforcement learning using a physical robot,” in Proc. Workshop on Teaching Machine Learning at the 29th Int. Conf. Machine Learning, 2012.

[132] 
H. Song, D. Ding, H. Dong, and X. Yi, “Distributed filtering based on Cauchykernelbased maximum correntropy subject to randomly occurring cyberattacks,” Automatica, vol. 135, p. 110004, Jan. 2022. doi: 10.1016/j.automatica.2021.110004

[133] 
Y. Huang, L. Huang, and Q. Zhu, “Reinforcement learning for feedbackenabled cyber resilience,” Annu. Rev. Control, vol. 53, pp. 273–295, Jul. 2022. doi: 10.1016/j.arcontrol.2022.01.001

[134] 
M. Xie, D. Ding, X. Ge, Q.L. Han, H. Dong, and Y. Song, “Distributed platooning control of automated vehicles subject to replay attacks based on proportional integral observers,” IEEE/CAA J. Autom. Sinica, 2022. DOI: 10.1109/JAS.2022.105941

[135] 
F. Wei, Z. Wan, and H. He, “Cyberattack recovery strategy for smart grid based on deep reinforcement learning,” IEEE Trans. Smart Grid, vol. 11, no. 3, pp. 2476–2486, May 2020. doi: 10.1109/TSG.2019.2956161

[136] 
M. N. Kurt, O. Ogundijo, C. Li, and X. Wang, “Online cyberattack detection in smart grid: A reinforcement learning approach,” IEEE Trans. Smart Grid, vol. 10, no. 5, pp. 5174–5185, Sept. 2019. doi: 10.1109/TSG.2018.2878570

[137] 
I. OrtegaFernandez and F. Liberati, “A review of denial of service attack and mitigation in the smart grid using reinforcement learning,” Energies, vol. 16, no. 2, p. 635, Jan. 2023. doi: 10.3390/en16020635

[138] 
S. Parker, Z. Wu, and P. D. Christofides, “Cybersecurity in process control, operations, and supply chain,” Comput. Chem. Eng., vol. 171, p. 108169, Mar. 2023. doi: 10.1016/j.compchemeng.2023.108169
