Development of a Bias Compensating Q-Learning Controller for a Multi-Zone HVAC Facility

Syed Ali Asad Rizvi; Amanda J. Pertzborn; Zongli Lin

doi:10.1109/JAS.2023.123624

Volume 10 Issue 8

Aug. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(8): 1704-1715

S. A. A. Rizvi, A. J. Pertzborn, and Z. Lin, “Development of a bias compensating Q-learning controller for a multi-zone HVAC facility,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 8, pp. 1704–1715, Aug. 2023. doi: 10.1109/JAS.2023.123624

Citation:

S. A. A. Rizvi, A. J. Pertzborn, and Z. Lin, “Development of a bias compensating Q-learning controller for a multi-zone HVAC facility,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 8, pp. 1704–1715, Aug. 2023. doi: 10.1109/JAS.2023.123624

Citation:

PDF( 1971 KB)

Development of a Bias Compensating Q-Learning Controller for a Multi-Zone HVAC Facility

doi: 10.1109/JAS.2023.123624

Funds: This work was supported in part by NIST (70NANB18H161)

More Information

Author Bio:
Syed Ali Asad Rizvi (Member, IEEE) received the B.E. degree in industrial electronics from the Institute of Industrial Electronics Engineering, NED University of Engineering and Technology, Pakistan, in 2012, the M.S. degree in electrical engineering from the National University of Sciences and Technology, Pakistan, in 2014, and the Ph.D. degree in electrical engineering from the University of Virginia, USA, in 2020. He was a Postdoctoral Fellow at the National Institute of Standards and Technology (NIST), USA (2020–2021). He is currently an Assistant Professor in the Department of Electrical and Computer Engineering at Tennessee Technological University, USA. His current research interests include optimal control, reinforcement learning, distributed learning and control, and their application in autonomous systems including autonomous vehicles, building HVAC systems, energy systems, and electric drives

Amanda J. Pertzborn received the Ph.D. degree from the University of Wisconsin-Madison. She is currently the PI of the Intelligent Building Agents project in the Building Energy and Environment Division at the National Institute of Standards and Technology (NIST), USA. Her research focuses on intelligent control of heating, ventilation, and air conditioning (HVAC) systems

Zongli Lin (Fellow, IEEE) is the Ferman W. Perry Professor in the School of Engineering and Applied Science and a Professor of electrical and computer engineering at University of Virginia. He received the B.S. degree in mathematics and computer science from Xiamen University, in 1983, the master of engineering degree in automatic control from Chinese Academy of Space Technology, in 1989, and the Ph.D. degree in electrical and computer engineering from Washington State University, USA, in 1994. His current research interests include nonlinear control, robust control, and control applications. He was an Associate Editor of IEEE Transactions on Automatic Control (2001–2003), IEEE/ASME Transactions on Mechatronics (2006–2009) and IEEE Control Systems Magazine (2005–2012). He was elected a Member of the Board of Governors of the IEEE Control Systems Society (2008–2010, 2019–2021) and chaired the IEEE Control Systems Society Technical Committee on Nonlinear Systems and Control (2013–2015). He has served on the operating committees of several conferences. He was the program chair of the 2018 American Control Conference and a general chair of the 13th and 16th International Symposium on Magnetic Bearings, held in 2012 and 2018, respectively. He currently serves on the editorial boards of several journals and book series, including Automatica, Systems & Control Letters, and Springer/Birkhauser book series Control Engineering. He is a Fellow of IEEE, CAA, International Federation of Automatic Control (IFAC), American Association for the Advancement of Science (AAAS)
Corresponding author: Zongli Lin, e-mail: zl5y@virginia.edu
Received Date: 2023-03-06
Revised Date: 2023-04-12
Accepted Date: 2023-04-18

Available Online: 2023-05-26

Abstract

Abstract

We present the development of a bias compensating reinforcement learning (RL) algorithm that optimizes thermal comfort (by minimizing tracking error) and control utilization (by penalizing setpoint deviations) in a multi-zone heating, ventilation, and air-conditioning (HVAC) lab facility subject to unmeasurable disturbances and unknown dynamics. It is shown that the presence of unmeasurable disturbance results in an inconsistent learning equation in traditional RL controllers leading to parameter estimation bias (even with integral action support), and in the extreme case, the divergence of the learning algorithm. We demonstrate this issue by applying the popular Q-learning algorithm to linear quadratic regulation (LQR) of a multi-zone HVAC environment and showing that, even with integral support, the algorithm exhibits bias issue during the learning phase when the HVAC disturbance is unmeasurable due to unknown heat gains, occupancy variations, light sources, and outside weather changes. To address this difficulty, we present a bias compensating learning equation that learns a lumped bias term as a result of disturbances (and possibly other sources) in conjunction with the optimal control parameters. Experimental results show that the proposed scheme not only recovers the bias-free optimal control parameters but it does so without explicitly learning the dynamic model or estimating the disturbances, demonstrating the effectiveness of the algorithm in addressing the above challenges.
- HVAC control,
- optimal tracking,
- Q-learning,
- reinforcement learning (RL)

FullText(HTML)

References(46)

References

[1]	J. Mei, Z. Y. Lu, J. H. Hu, and Y. L. Fan, “Energy-efficient optimal guaranteed cost intermittent-switch control of a direct expansion air conditioning system,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 11, pp. 1852–1866, Nov. 2020.
[2]	N. E. Klepeis, W. C. Nelson, W. R. Ott, J. P. Robinson, A. M. Tsang, P. Switzer, J. V. Behar, S. C. Hern, and W. H. Engelmann, “The national human activity pattern survey (NHAPS): A resource for assessing exposure to environmental pollutants,” J. Exp. Sci. Environ. Epidemiol., vol. 11, no. 3, pp. 231–252, May–Jun. 2001. doi: 10.1038/sj.jea.7500165
[3]	DOE, “An assessment of energy technologies and research opportunities,” in Proc. AIChE Annu. Meeting, 2015.
[4]	S. Prívara, J. Široký, L. Ferkl, and J. Cigler, “Model predictive control of a building heating system: The first experience,” Energy Build., vol. 43, no. 2-3, pp. 564–572, Feb.–Mar. 2011. doi: 10.1016/j.enbuild.2010.10.022
[5]	Y. Y. Fu, S. C. Xu, Q. Zhu, and Z. O’Neill, “Containerized framework for building control performance comparisons: Model predictive control vs deep reinforcement learning control,” in Proc. 8th ACM Int. Conf. Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 2021, pp. 276–280.
[6]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998.
[7]	M. M. Ha, D. Wang, and D. R. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
[8]	G. Geng and G. M. Geary, “On performance and tuning of PID controllers in HVAC systems,” in Proc. IEEE Int. Conf. Control and Applications, Vancouver, Canada, 1993, pp. 819–824.
[9]	R. S. Sutton, A. G. Barto, and R. J. Williams, “Reinforcement learning is direct adaptive optimal control,” IEEE Control Syst. Mag., vol. 12, no. 2, pp. 19–22, Apr. 1992. doi: 10.1109/37.126844
[10]	M. J. Han, R. May, X. X. Zhang, X. R. Wang, S. Pan, D. Yan, and Y. Jin, “A novel reinforcement learning method for improving occupant comfort via window opening and closing,” Sustain. Cities Soc., vol. 61, p. 102247, Oct. 2020. doi: 10.1016/j.scs.2020.102247
[11]	P. Fazenda, K. Veeramachaneni, P. Lima, and U.-M. O'Reilly, “Using reinforcement learning to optimize occupant comfort and energy usage in HVAC systems,” J. Ambient Intell. Smart Environ., vol. 6, no. 6, pp. 675–690, Jan. 2014. doi: 10.3233/AIS-140288
[12]	A. Overgaard, B. K. Nielsen, C. S. Kallesøe, and J. D. Bendtsen, “Reinforcement learning for mixing loop control with flow variable eligibility trace,” in Proc. IEEE Conf. Control Technology and Applications, Hong Kong, China, 2019, pp. 1043–1048.
[13]	T. S. Wei, Y. Z. Wang, and Q. Zhu, “Deep reinforcement learning for building HVAC control,” in Proc. 54th ACM/EDAC/IEEE Design Automation Conf., Austin, USA, 2017, pp. 1–6.
[14]	E. Barrett and S. Linder, “Autonomous HVAC control, a reinforcement learning approach,” in Proc. Joint European Conf. Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 2015, pp. 3–19.
[15]	G. T. Costanzo, S. Iacovella, F. Ruelens, T. Leurs, and B. J. Claessens, “Experimental analysis of data-driven control for a building heating system,” Sustain. Energy Grids Netw., vol. 6, pp. 81–90, Jun. 2016. doi: 10.1016/j.segan.2016.02.002
[16]	Z. Wang and T. Z. Hong, “Reinforcement learning for building controls: The opportunities and challenges,” Appl. Energy, vol. 269, p. 115036, Jul. 2020. doi: 10.1016/j.apenergy.2020.115036
[17]	J. X. Zhang, K. W. Li, and Y. M. Li, “Output-feedback based simplified optimized backstepping control for strict-feedback systems with input and state constraints,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1119–1132, Jun. 2021. doi: 10.1109/JAS.2021.1004018
[18]	Y. J. Chen, L. K. Norford, H. W. Samuelson, and A. Malkawi, “Optimal control of HVAC and window systems for natural ventilation through reinforcement learning,” Energy Build., vol. 169, pp. 195–205, Jun. 2018. doi: 10.1016/j.enbuild.2018.03.051
[19]	G. P. Henze and R. H. Dodier, “Adaptive optimal control of a grid-independent photovoltaic system,” J. Sol. Energy Eng., vol. 125, no. 1, pp. 34–42, Feb. 2003. doi: 10.1115/1.1532005
[20]	L. Yang, Z. Nagy, P. Goffin, and A. Schlueter, “Reinforcement learning for optimal control of low exergy buildings,” Appl. Energy, vol. 156, pp. 577–586, Oct. 2015. doi: 10.1016/j.apenergy.2015.07.050
[21]	K. Dalamagkidis, D. Kolokotsa, K. Kalaitzakis, and G. S. Stavrakakis, “Reinforcement learning for energy conservation and comfort in buildings,” Build. Environ., vol. 42, no. 7, pp. 2686–2698, Jul. 2007. doi: 10.1016/j.buildenv.2006.07.010
[22]	C. X. Guan, Y. Z. Wang, X. Lin, S. Nazarian, and M. Pedram, “Reinforcement learning-based control of residential energy storage systems for electric bill minimization,” in Proc. 12th Annu. IEEE Consumer Communications and Networking Conf., Las Vegas, USA, 2015, pp. 637–642.
[23]	S. Y. Zhou, Z. J. Hu, W. Gu, M. Jiang, and X.-P. Zhang, “Artificial intelligence based smart energy community management: A reinforcement learning approach,” CSEE J. Power Energy Syst., vol. 5, no. 1, pp. 1–10, Mar. 2019.
[24]	Y. Y. Ran and M. H. Jun, “Performance based thermal comfort control (PTCC) using deep reinforcement learning for space cooling,” Energy Build., vol. 203, p. 109420, Nov. 2019. doi: 10.1016/j.enbuild.2019.109420
[25]	B. Sun, P. B. Luh, Q.-S. Jia, and B. Yan, “Event-based optimization within the lagrangian relaxation framework for energy savings in HVAC systems,” IEEE Trans. Autom. Sci. Eng., vol. 12, no. 4, pp. 1396–1406, Oct. 2015. doi: 10.1109/TASE.2015.2455419
[26]	Y. Zhang and M. van der Schaar, “Structure-aware stochastic load management in smart grids,” in Proc. IEEE INFOCOM 2014-IEEE Conf. Computer Communications, Toronto, Canada, pp. 2643–2651.
[27]	B.-G. Kim, Y. Zhang, M. van der Schaar, and J.-W. Lee, “Dynamic pricing and energy consumption scheduling with reinforcement learning,” IEEE Trans. Smart Grid, vol. 7, no. 5, pp. 2187–2198, Sept. 2016. doi: 10.1109/TSG.2015.2495145
[28]	D. Bertsekas, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 249–272, Feb. 2021. doi: 10.1109/JAS.2021.1003814
[29]	K. G. Vamvoudakis, H. Modares, B. Kiumarsi, and F. L. Lewis, “Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online,” IEEE Control Syst. Mag., vol. 37, no. 1, pp. 33–52, Feb. 2017. doi: 10.1109/MCS.2016.2621461
[30]	J. Huang, Nonlinear Output Regulation—Theory and Applications, Philadelphia, USA: SIAM, 2004.
[31]	Y. J. Peng, Q. Chen, and W. J. Sun, “Reinforcement Q-learning algorithm for H_∞ tracking control of unknown discrete-time linear systems,” IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 4109–4122, Nov. 2020. doi: 10.1109/TSMC.2019.2957000
[32]	C. Chen, F. L. Lewis, K. Xie, S. L. Xie, and Y. L. Liu, “Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems,” Automatica, vol. 119, p. 109081, Sept. 2020. doi: 10.1016/j.automatica.2020.109081
[33]	B. S. Lian, W. Q. Xue, F. L. Lewis, and T. Y. Chai, “Robust inverse Q-learning for continuous-time linear systems in adversarial environments,” IEEE Trans. Cybern., vol. 52, no. 12, pp. 13083–13095, Dec. 2022. doi: 10.1109/TCYB.2021.3100749
[34]	B. S. Lian, W. Q. Xue, F. L. Lewis, and T. Y. Chai, “Inverse reinforcement learning for adversarial apprentice games,” IEEE Trans. Neural Netw. Learn. Syst., 2021. DOI: 10.1109/TNNLS.2021.3114612
[35]	B. Tashtoush, M. Molhim, and M. Al-Rousan, “Dynamic model of an HVAC system for control analysis,” Energy, vol. 30, no. 10, pp. 1729–1745, Jul. 2005. doi: 10.1016/j.energy.2004.10.004
[36]	F. L. Lewis and V. L. Syrmos, Optimal Control. 2nd ed. Chichester, UK: John Wiley & Sons, 1995.
[37]	S. A. A. Rizvi, A. J. Pertzborn, and Z. L. Lin, “Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 7523–7533, Dec. 2022. doi: 10.1109/TNNLS.2021.3085358
[38]	S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” in Proc. American Control Conf., Baltimore, USA, 1994, pp. 3475–3479.
[39]	Z. Chen and N. Li, “An optimal control-based distributed reinforcement learning framework for a class of non-convex objective functionals of the multi-agent network,” IEEE/CAA J. Autom. Sinica, 2022. DOI: 10.1109/JAS.2022.105992
[40]	G. Tao, Adaptive Control Design and Analysis. Hoboken, USA: Wiley-Interscience, 2003.
[41]	C. H. Liu, F. Zhu, Q. Liu, and Y. C. Fu, “Hierarchical reinforcement learning with automatic sub-goal identification,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 10, pp. 1686–1696, Oct. 2021. doi: 10.1109/JAS.2021.1004141
[42]	IBAL Overview, National Institute of Standards and Technology. [Online]. Available: https://www.nist.gov/el/energy-and-environment-division-73200/intelligent-buildings-agents-project/ibal-overview
[43]	A. J. Pertzborn, “Intelligent building agents laboratory: Hydronic system design,” U.S. Department of Commerce, National Institute of Standards and Technology, 2016. [Online]. Available: https://doi.org/10.6028/NIST.TN.1933
[44]	A. J. Pertzborn and D. A. Veronica, “Intelligent building agents laboratory: Air system design,” U.S. Department of Commerce, National Institute of Standards and Technology, 2016. [Online]. Available: https://doi.org/10.6028/NIST.TN.2025
[45]	A. J. Pertzborn and D. A. Veronica, “Baseline control systems in the intelligent building agents laboratory,” U.S. Department of Commerce, National Institute of Standards and Technology, 2021. [Online]. Available: https://doi.org/10.6028/NIST.TN.2178
[46]	Y. Y. Li, Y. J. Tang, R. Y. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Trans. Autom. Control, vol. 67, no. 12, pp. 6429–6444, Dec. 2022. doi: 10.1109/TAC.2021.3128592

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(16) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (452) PDF downloads(57)

Highlights

A bias compensating Q-learning algorithm to handle unmeasurable disturbances
Implementation aspects of Q-learning in a multi-zone HVAC facility
Robustness to disturbances arising from unknown heat gains and weather variations
Analysis of various exploration signals for reinforcement learning based HVAC control
Asymptotic temperature tracking and parameter convergence

Development of a Bias Compensating Q-Learning Controller for a Multi-Zone HVAC Facility

doi: 10.1109/JAS.2023.123624

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content