Volume 13
Issue 1
IEEE/CAA Journal of Automatica Sinica
| Citation: | H. Feng, Y. Cheng, and X. Wang, “Offline generalized actor-critic with distance regularization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 1, pp. 57–71, Jan. 2026. doi: 10.1109/JAS.2025.125633 |
| [1] |
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, et al, “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, Oct. 2017. doi: 10.1038/nature24270
|
| [2] |
Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Cooperative finitely excited learning for dynamical games,” IEEE Trans. Cybern., vol. 54, no. 2, pp. 797–810, Feb. 2024. doi: 10.1109/TCYB.2023.3274908
|
| [3] |
B. Hambly, R. Xu, and H. Yang, “Recent advances in reinforcement learning in finance,” Math. Finance, vol. 33, no. 3, pp. 437–503, Jul. 2023. doi: 10.1111/mafi.12382
|
| [4] |
Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free $\lambda $-policy iteration for discrete-time linear quadratic regulation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 2, pp. 635–649, Feb. 2023. doi: 10.1109/TNNLS.2021.3098985
|
| [5] |
B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Perez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, Jun. 2022. doi: 10.1109/TITS.2021.3054625
|
| [6] |
L. D’Alfonso, F. Giannini, G. Franzè, G. Fedele, F. Pupo, and G. Fortino, “Autonomous vehicle platoons in urban road networks: A joint distributed reinforcement learning and model predictive control approach,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 141–156, Jan. 2024. doi: 10.1109/JAS.2023.123705
|
| [7] |
R. Wu, Z. Yao, J. Si, and H. H. Huang, “Robotic knee tracking control to mimic the intact human knee profile based on actor-critic reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 19–30, Jan. 2022. doi: 10.1109/JAS.2021.1004272
|
| [8] |
H. Ding, Y. Tang, Q. Wu, B. Wang, C. Chen, and Z. Wang, “Magnetic field-based reward shaping for goal-conditioned reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 12, pp. 2233–2247, Dec. 2023. doi: 10.1109/JAS.2023.123477
|
| [9] |
C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Comput. Surv., vol. 55, no. 1, pp. 1–36, Jan. 2023.
|
| [10] |
S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv: 2005.01643, 2020.
|
| [11] |
R. F. Prudencio, M. R. O. A. Maximo, and E. L. Colombini, “A survey on offline reinforcement learning: Taxonomy, review, and open problems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 8, pp. 10237–10257, Aug. 2024. doi: 10.1109/TNNLS.2023.3250269
|
| [12] |
S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 2052−2062.
|
| [13] |
Y. Wu, G. Tucker, and O. Nachum, “Behavior regularized offline reinforcement learning,” in Proc. Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2019.
|
| [14] |
Z. Wang, A. Novikov, K. Żołna, J. T. Springenberg, S. Reed, B. Shahriari, et al, “Critic regularized regression,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 7768−7778.
|
| [15] |
X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage weighted regression: Simple and scalable off-policy reinforcement learning,” in Proc. Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
|
| [16] |
A. Nair, M. Dalal, A. Gupta, and S. Levine, “AWAC: Accelerating online reinforcement learning with offline datasets,” in Proc. Int. Conf. Learning Representations, Vienna, Austria, 2021.
|
| [17] |
A. Kumar, J. Fu, G. Tucker, and S. Levine, “Stabilizing off-policy Q-learning via bootstrapping error reduction,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 11784−11794.
|
| [18] |
S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” in Proc. 35th Int. Conf. Neural Information Processing Systems, Virtual Event, 2021, pp. 20132−20145.
|
| [19] |
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1587−1596.
|
| [20] |
Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion policies as an expressive policy class for offline reinforcement learning,” in Proc. 11th Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [21] |
A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-learning for offline reinforcement learning,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 1179−1191.
|
| [22] |
H. Xu, X. Zhan, and X. Zhu, “Constraints penalized Q-learning for safe offline reinforcement learning,” in Proc. 36th AAAI Conf. Artificial Intelligence, Virtual Event, 2022, pp. 8753−8760.
|
| [23] |
K. Ghasemipour, S. S. Gu, and O. Nachum, “Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, pp. 18267−18281.
|
| [24] |
I. Kostrikov, J. Tompson, R. Fergus, and O. Nachum, “Offline reinforcement learning with Fisher divergence critic regularization,” in Proc. 38th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 5774−5783.
|
| [25] |
I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” in Proc. 10th Int. Conf. Learning Representations, Virtual Event, 2022.
|
| [26] |
X. Ma, Y. Yang, H. Hu, J. Yang, C. Zhang, Q. Zhao, B. Liang, and Q. Liu, “Offline reinforcement learning with value-based episodic memory,” in Proc. 10th Int. Conf. Learning Representations, Virtual Event, 2022.
|
| [27] |
D. Garg, H. Joey, M. Geist, and S. Ermon, “Extreme Q-learning: Maxent RL without entropy,” in Proc. 11th Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [28] |
H. Xu, L. Jiang, J. Li, Z. Zhang, Z. Wang, W. K. V. Chan, and X. Zhan, “Offline RL with no OOD actions: In-sample learning via implicit value regularization,” in Proc. 11th Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [29] |
J. Li, X. Zhan, H. Xu, X. Zhu, J. Liu, and Y. Q. Zhang, “When data geometry meets deep function: Generalizing offline reinforcement learning,” in Proc. 11th Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [30] |
Y. Ran, Y. C. Li, F. Zhang, Z. Zhang, and Y. Yu, “Policy regularization with dataset constraint for offline reinforcement learning,” in Proc. 40th Int. Conf. Machine Learning, Honolulu, USA, 2023, pp. 28701−28717.
|
| [31] |
J. Lyu, X. Ma, X. Li, and Z. Lu, “Mildly conservative Q-learning for offline reinforcement learning,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, pp. 1711−1724.
|
| [32] |
C. Bai, L. Wang, Z. Yang, Z. H. Deng, A. Garg, P. Liu, and Z. Wang, “Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning,” in Proc. 10th Int. Conf. Learning Representations, Virtual Event, 2022.
|
| [33] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, USA: MIT Press, 2018.
|
| [34] |
O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, J. Soesanto, H. Chen, K. Velswamy, F. Ibrahim, and B. Huang, “Reinforcement learning in process industries: Review and perspective,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 283–300, Feb. 2024. doi: 10.1109/JAS.2024.124227
|
| [35] |
Y. Cheng, L. Huang, and X. Wang, “Authentic boundary proximal policy optimization,” IEEE Trans. Cybern., vol. 52, no. 9, pp. 9428–9438, Sept. 2022. doi: 10.1109/TCYB.2021.3051456
|
| [36] |
D. Bertsekas, “Multiagent reinforcement learning: Rollout and policy iteration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 249–272, Feb. 2021. doi: 10.1109/JAS.2021.1003814
|
| [37] |
X. Wang, T. Li, Y. Cheng, and C. L. P. Chen, “Inference-based posteriori parameter distribution optimization,” IEEE Trans. Cybern., vol. 52, no. 5, pp. 3006–3017, May 2022. doi: 10.1109/TCYB.2020.3023127
|
| [38] |
Y. Yang, Y. Pan, C. Z. Xu, and D. C. Wunsch, “Hamiltonian-driven adaptive dynamic programming with efficient experience replay,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 3, pp. 3278–3290, Mar. 2024. doi: 10.1109/TNNLS.2022.3213566
|
| [39] |
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4RL: Datasets for deep data-driven reinforcement learning,” in Proc. Int. Conf. Learning Representations, Vienna, Austria, 2021.
|
| [40] |
Y. Gu, Y. Cheng, K. Yu, and X. S. Wang, “Anti-martingale proximal policy optimization,” IEEE Trans. Cybern., vol. 53, no. 10, pp. 6421–6432, Oct. 2023. doi: 10.1109/TCYB.2022.3170355
|
| [41] |
Y. Wu, S. Zhai, N. Srivastava, J. M. Susskind, J. Zhang, R. Salakhutdinov, and H. Goh, “Uncertainty weighted actor-critic for offline reinforcement learning,” in Proc. 38th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 11319−11328.
|
| [42] |
D. Tarasov, A. Nikulin, D. Akimov, V. Kurenkov, and S. Kolesnikov, “CORL: Research-oriented deep offline reinforcement learning library,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, pp. 30997−31020.
|
| [43] |
L. Huang, B. Dong, J. Lu, and W. Zhang, “Mild policy evaluation for offline actor-critic,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 12, pp. 17950–17964, Dec. 2024.
|