A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 8
Aug.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
S. A. A. Rizvi, A. J. Pertzborn, and  Z. Lin,  “Development of a bias compensating Q-learning controller for a multi-zone HVAC facility,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 8, pp. 1704–1715, Aug. 2023. doi: 10.1109/JAS.2023.123624
Citation: S. A. A. Rizvi, A. J. Pertzborn, and  Z. Lin,  “Development of a bias compensating Q-learning controller for a multi-zone HVAC facility,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 8, pp. 1704–1715, Aug. 2023. doi: 10.1109/JAS.2023.123624

Development of a Bias Compensating Q-Learning Controller for a Multi-Zone HVAC Facility

doi: 10.1109/JAS.2023.123624
Funds:  This work was supported in part by NIST (70NANB18H161)
More Information
  • We present the development of a bias compensating reinforcement learning (RL) algorithm that optimizes thermal comfort (by minimizing tracking error) and control utilization (by penalizing setpoint deviations) in a multi-zone heating, ventilation, and air-conditioning (HVAC) lab facility subject to unmeasurable disturbances and unknown dynamics. It is shown that the presence of unmeasurable disturbance results in an inconsistent learning equation in traditional RL controllers leading to parameter estimation bias (even with integral action support), and in the extreme case, the divergence of the learning algorithm. We demonstrate this issue by applying the popular Q-learning algorithm to linear quadratic regulation (LQR) of a multi-zone HVAC environment and showing that, even with integral support, the algorithm exhibits bias issue during the learning phase when the HVAC disturbance is unmeasurable due to unknown heat gains, occupancy variations, light sources, and outside weather changes. To address this difficulty, we present a bias compensating learning equation that learns a lumped bias term as a result of disturbances (and possibly other sources) in conjunction with the optimal control parameters. Experimental results show that the proposed scheme not only recovers the bias-free optimal control parameters but it does so without explicitly learning the dynamic model or estimating the disturbances, demonstrating the effectiveness of the algorithm in addressing the above challenges.


  • loading
  • [1]
    J. Mei, Z. Y. Lu, J. H. Hu, and Y. L. Fan, “Energy-efficient optimal guaranteed cost intermittent-switch control of a direct expansion air conditioning system,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 11, pp. 1852–1866, Nov. 2020.
    N. E. Klepeis, W. C. Nelson, W. R. Ott, J. P. Robinson, A. M. Tsang, P. Switzer, J. V. Behar, S. C. Hern, and W. H. Engelmann, “The national human activity pattern survey (NHAPS): A resource for assessing exposure to environmental pollutants,” J. Exp. Sci. Environ. Epidemiol., vol. 11, no. 3, pp. 231–252, May–Jun. 2001. doi: 10.1038/sj.jea.7500165
    DOE, “An assessment of energy technologies and research opportunities,” in Proc. AIChE Annu. Meeting, 2015.
    S. Prívara, J. Široký, L. Ferkl, and J. Cigler, “Model predictive control of a building heating system: The first experience,” Energy Build., vol. 43, no. 2-3, pp. 564–572, Feb.–Mar. 2011. doi: 10.1016/j.enbuild.2010.10.022
    Y. Y. Fu, S. C. Xu, Q. Zhu, and Z. O’Neill, “Containerized framework for building control performance comparisons: Model predictive control vs deep reinforcement learning control,” in Proc. 8th ACM Int. Conf. Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 2021, pp. 276–280.
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998.
    M. M. Ha, D. Wang, and D. R. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
    G. Geng and G. M. Geary, “On performance and tuning of PID controllers in HVAC systems,” in Proc. IEEE Int. Conf. Control and Applications, Vancouver, Canada, 1993, pp. 819–824.
    R. S. Sutton, A. G. Barto, and R. J. Williams, “Reinforcement learning is direct adaptive optimal control,” IEEE Control Syst. Mag., vol. 12, no. 2, pp. 19–22, Apr. 1992. doi: 10.1109/37.126844
    M. J. Han, R. May, X. X. Zhang, X. R. Wang, S. Pan, D. Yan, and Y. Jin, “A novel reinforcement learning method for improving occupant comfort via window opening and closing,” Sustain. Cities Soc., vol. 61, p. 102247, Oct. 2020. doi: 10.1016/j.scs.2020.102247
    P. Fazenda, K. Veeramachaneni, P. Lima, and U.-M. O'Reilly, “Using reinforcement learning to optimize occupant comfort and energy usage in HVAC systems,” J. Ambient Intell. Smart Environ., vol. 6, no. 6, pp. 675–690, Jan. 2014. doi: 10.3233/AIS-140288
    A. Overgaard, B. K. Nielsen, C. S. Kallesøe, and J. D. Bendtsen, “Reinforcement learning for mixing loop control with flow variable eligibility trace,” in Proc. IEEE Conf. Control Technology and Applications, Hong Kong, China, 2019, pp. 1043–1048.
    T. S. Wei, Y. Z. Wang, and Q. Zhu, “Deep reinforcement learning for building HVAC control,” in Proc. 54th ACM/EDAC/IEEE Design Automation Conf., Austin, USA, 2017, pp. 1–6.
    E. Barrett and S. Linder, “Autonomous HVAC control, a reinforcement learning approach,” in Proc. Joint European Conf. Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 2015, pp. 3–19.
    G. T. Costanzo, S. Iacovella, F. Ruelens, T. Leurs, and B. J. Claessens, “Experimental analysis of data-driven control for a building heating system,” Sustain. Energy Grids Netw., vol. 6, pp. 81–90, Jun. 2016. doi: 10.1016/j.segan.2016.02.002
    Z. Wang and T. Z. Hong, “Reinforcement learning for building controls: The opportunities and challenges,” Appl. Energy, vol. 269, p. 115036, Jul. 2020. doi: 10.1016/j.apenergy.2020.115036
    J. X. Zhang, K. W. Li, and Y. M. Li, “Output-feedback based simplified optimized backstepping control for strict-feedback systems with input and state constraints,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1119–1132, Jun. 2021. doi: 10.1109/JAS.2021.1004018
    Y. J. Chen, L. K. Norford, H. W. Samuelson, and A. Malkawi, “Optimal control of HVAC and window systems for natural ventilation through reinforcement learning,” Energy Build., vol. 169, pp. 195–205, Jun. 2018. doi: 10.1016/j.enbuild.2018.03.051
    G. P. Henze and R. H. Dodier, “Adaptive optimal control of a grid-independent photovoltaic system,” J. Sol. Energy Eng., vol. 125, no. 1, pp. 34–42, Feb. 2003. doi: 10.1115/1.1532005
    L. Yang, Z. Nagy, P. Goffin, and A. Schlueter, “Reinforcement learning for optimal control of low exergy buildings,” Appl. Energy, vol. 156, pp. 577–586, Oct. 2015. doi: 10.1016/j.apenergy.2015.07.050
    K. Dalamagkidis, D. Kolokotsa, K. Kalaitzakis, and G. S. Stavrakakis, “Reinforcement learning for energy conservation and comfort in buildings,” Build. Environ., vol. 42, no. 7, pp. 2686–2698, Jul. 2007. doi: 10.1016/j.buildenv.2006.07.010
    C. X. Guan, Y. Z. Wang, X. Lin, S. Nazarian, and M. Pedram, “Reinforcement learning-based control of residential energy storage systems for electric bill minimization,” in Proc. 12th Annu. IEEE Consumer Communications and Networking Conf., Las Vegas, USA, 2015, pp. 637–642.
    S. Y. Zhou, Z. J. Hu, W. Gu, M. Jiang, and X.-P. Zhang, “Artificial intelligence based smart energy community management: A reinforcement learning approach,” CSEE J. Power Energy Syst., vol. 5, no. 1, pp. 1–10, Mar. 2019.
    Y. Y. Ran and M. H. Jun, “Performance based thermal comfort control (PTCC) using deep reinforcement learning for space cooling,” Energy Build., vol. 203, p. 109420, Nov. 2019. doi: 10.1016/j.enbuild.2019.109420
    B. Sun, P. B. Luh, Q.-S. Jia, and B. Yan, “Event-based optimization within the lagrangian relaxation framework for energy savings in HVAC systems,” IEEE Trans. Autom. Sci. Eng., vol. 12, no. 4, pp. 1396–1406, Oct. 2015. doi: 10.1109/TASE.2015.2455419
    Y. Zhang and M. van der Schaar, “Structure-aware stochastic load management in smart grids,” in Proc. IEEE INFOCOM 2014-IEEE Conf. Computer Communications, Toronto, Canada, pp. 2643–2651.
    B.-G. Kim, Y. Zhang, M. van der Schaar, and J.-W. Lee, “Dynamic pricing and energy consumption scheduling with reinforcement learning,” IEEE Trans. Smart Grid, vol. 7, no. 5, pp. 2187–2198, Sept. 2016. doi: 10.1109/TSG.2015.2495145
    D. Bertsekas, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 249–272, Feb. 2021. doi: 10.1109/JAS.2021.1003814
    K. G. Vamvoudakis, H. Modares, B. Kiumarsi, and F. L. Lewis, “Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online,” IEEE Control Syst. Mag., vol. 37, no. 1, pp. 33–52, Feb. 2017. doi: 10.1109/MCS.2016.2621461
    J. Huang, Nonlinear Output Regulation—Theory and Applications, Philadelphia, USA: SIAM, 2004.
    Y. J. Peng, Q. Chen, and W. J. Sun, “Reinforcement Q-learning algorithm for H tracking control of unknown discrete-time linear systems,” IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 4109–4122, Nov. 2020. doi: 10.1109/TSMC.2019.2957000
    C. Chen, F. L. Lewis, K. Xie, S. L. Xie, and Y. L. Liu, “Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems,” Automatica, vol. 119, p. 109081, Sept. 2020. doi: 10.1016/j.automatica.2020.109081
    B. S. Lian, W. Q. Xue, F. L. Lewis, and T. Y. Chai, “Robust inverse Q-learning for continuous-time linear systems in adversarial environments,” IEEE Trans. Cybern., vol. 52, no. 12, pp. 13083–13095, Dec. 2022. doi: 10.1109/TCYB.2021.3100749
    B. S. Lian, W. Q. Xue, F. L. Lewis, and T. Y. Chai, “Inverse reinforcement learning for adversarial apprentice games,” IEEE Trans. Neural Netw. Learn. Syst., 2021. DOI: 10.1109/TNNLS.2021.3114612
    B. Tashtoush, M. Molhim, and M. Al-Rousan, “Dynamic model of an HVAC system for control analysis,” Energy, vol. 30, no. 10, pp. 1729–1745, Jul. 2005. doi: 10.1016/j.energy.2004.10.004
    F. L. Lewis and V. L. Syrmos, Optimal Control. 2nd ed. Chichester, UK: John Wiley & Sons, 1995.
    S. A. A. Rizvi, A. J. Pertzborn, and Z. L. Lin, “Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 7523–7533, Dec. 2022. doi: 10.1109/TNNLS.2021.3085358
    S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” in Proc. American Control Conf., Baltimore, USA, 1994, pp. 3475–3479.
    Z. Chen and N. Li, “An optimal control-based distributed reinforcement learning framework for a class of non-convex objective functionals of the multi-agent network,” IEEE/CAA J. Autom. Sinica, 2022. DOI: 10.1109/JAS.2022.105992
    G. Tao, Adaptive Control Design and Analysis. Hoboken, USA: Wiley-Interscience, 2003.
    C. H. Liu, F. Zhu, Q. Liu, and Y. C. Fu, “Hierarchical reinforcement learning with automatic sub-goal identification,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 10, pp. 1686–1696, Oct. 2021. doi: 10.1109/JAS.2021.1004141
    IBAL Overview, National Institute of Standards and Technology. [Online]. Available: https://www.nist.gov/el/energy-and-environment-division-73200/intelligent-buildings-agents-project/ibal-overview
    A. J. Pertzborn, “Intelligent building agents laboratory: Hydronic system design,” U.S. Department of Commerce, National Institute of Standards and Technology, 2016. [Online]. Available: https://doi.org/10.6028/NIST.TN.1933
    A. J. Pertzborn and D. A. Veronica, “Intelligent building agents laboratory: Air system design,” U.S. Department of Commerce, National Institute of Standards and Technology, 2016. [Online]. Available: https://doi.org/10.6028/NIST.TN.2025
    A. J. Pertzborn and D. A. Veronica, “Baseline control systems in the intelligent building agents laboratory,” U.S. Department of Commerce, National Institute of Standards and Technology, 2021. [Online]. Available: https://doi.org/10.6028/NIST.TN.2178
    Y. Y. Li, Y. J. Tang, R. Y. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Trans. Autom. Control, vol. 67, no. 12, pp. 6429–6444, Dec. 2022. doi: 10.1109/TAC.2021.3128592


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(16)  / Tables(4)

    Article Metrics

    Article views (257) PDF downloads(49) Cited by()


    • A bias compensating Q-learning algorithm to handle unmeasurable disturbances
    • Implementation aspects of Q-learning in a multi-zone HVAC facility
    • Robustness to disturbances arising from unknown heat gains and weather variations
    • Analysis of various exploration signals for reinforcement learning based HVAC control
    • Asymptotic temperature tracking and parameter convergence


    DownLoad:  Full-Size Img  PowerPoint