Volume 13
Issue 1
IEEE/CAA Journal of Automatica Sinica
| Citation: | L. Chen, L. Wang, L. Jin, and J. Wang, “LMAdam: Enhancing adam via linear multistep discretization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 1, pp. 161–169, Jan. 2026. doi: 10.1109/JAS.2025.125834 |
| [1] |
C. Pan, J. Peng, and Z. Zhang, “Depth-guided vision transformer with normalizing flows for monocular 3D object detection,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 3, pp. 673–689, Mar. 2024. doi: 10.1109/JAS.2023.123660
|
| [2] |
Y. Zha, Y. Yang, R. Li, and Z. Hu, “Text alignment is an efficient unified model for massive NLP tasks,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, vol. 36, 2024.
|
| [3] |
Y. Liu, B. Tian, Y. Lv, L. Li, and F.-Y. Wang, “Point cloud classification using content-based transformer via clustering in feature space,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 2066–2074, Jan. 2024.
|
| [4] |
Z. Zhang, Z. Lei, M. Omura, H. Hasegawa, and S. Gao, “Dendritic learning-incorporated vision transformer for image recognition,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 539–541, Feb. 2024. doi: 10.1109/JAS.2023.123978
|
| [5] |
W. Xu, C. Zhao, J. Cheng, Y. Wang, Y. Tang, T. Zhang, Z. Yuan, Y. Lv, and F.-Y. Wang, “Transformer-based macroscopic regulation for high-speed railway timetable rescheduling,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1822–1833, Sept. 2023. doi: 10.1109/JAS.2023.123501
|
| [6] |
Y. Demidovich, G. Malinovsky, I. Sokolov, and P. Richtárik, “A guide through the zoo of biased SGD,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, vol. 36, 2024.
|
| [7] |
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., San Diego, CA, USA, May 2015.
|
| [8] |
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, no. 7, pp. 1–39, Feb. 2011.
|
| [9] |
T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, May 2012.
|
| [10] |
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Minneapolis, MN, USA, 2019, pp. 4171–4186.
|
| [11] |
W. Zeng, “Pangu-α: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation,” 2021. [Online]. Available: arXiv: 2104.12369.
|
| [12] |
E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y. Zhou, S. Savarese, and C. Xiong, “CodeGen: An open large language model for code with multi-turn program synthesis,” in Proc. Int. Conf. Learn. Represent., Kigali, Rwanda, 2023.
|
| [13] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” In Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp 5998–6008.
|
| [14] |
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. Learn. Represent., New Orleans, LA, USA, 2019.
|
| [15] |
M. Zaheer, S. Reddi, D. Sachan, S. Kale, and S. Kumar, “Adaptive methods for nonconvex optimization,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, QC, Canada, 2018, pp. 9815–9825.
|
| [16] |
S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of Adam and beyond,” in Proc. Int. Conf. Learn. Represent., Vancouver, BC, Canada, 2018, pp. 1–23.
|
| [17] |
X. Xie, P. Zhou, H. Li, Z. Lin, and S. Yan, “Adan: Adaptive Nesterov momentum algorithm for faster optimizing deep models,” 2022. [Online]. Available: arXiv: 2208.06677.
|
| [18] |
D. F. Griffiths and D. J. Higham, Numerical methods for ordinary differential equations: Initial value problems. London, U.K.: Springer, 2010.
|
| [19] |
A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A simple convergence proof of Adam and Adagrad,” Trans. Mach. Learn. Res., vol. 2022, no. 1, pp. 1–30, Oct. 2020.
|
| [20] |
Z. Xie, X. Wang, H. Zhang, I. Sato, and M. Sugiyama, “Adaptive inertia: Disentangling the effects of adaptive learning rate and momentum,” in Proc. Int. Conf. Mach. Learn., Baltimore, MD, USA, 2022, pp. 24430–24459.
|
| [21] |
W. E, “A proposal on machine learning via dynamical systems,” Commun. Math. Stat., vol. 5, no. 1, pp. 1–11, Mar. 2017.
|
| [22] |
Y. Lu, A. Zhong, Q. Li, and B. Dong, “Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations,” in Proc. Int. Conf. Mach. Learn., Stockholm, Sweden, July 2018, pp. 3276–3285.
|
| [23] |
R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, Canada, vol. 31, 2018.
|
| [24] |
L. Chen, L. Jin, and M. Shang, “Zero stability well predicts performance of convolutional neural networks,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 6268–6277.
|
| [25] |
K. Atkinson, W. Han, and D. E. Stewart, Numerical solution of ordinary differential equations. Hoboken, NJ, USA: Wiley, 2009.
|
| [26] |
Y. Zhang, L. Jin, D. Guo, Y. Yin, and Y. Chou, “Taylor-type 1-step-ahead numerical differentiation rule for first-order derivative approximation and ZNN discretization,” J. Comput. Appl. Math., vol. 273, pp. 29–40, Jan. 2015. doi: 10.1016/j.cam.2014.05.027
|
| [27] |
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, 2016, pp. 770–778.
|
| [28] |
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., San Diego, CA, USA, 2015.
|
| [29] |
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, Hawaii, USA, 2017, pp. 4700–4708.
|
| [30] |
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM., vol. 63, no. 11, pp. 139–144, Nov. 2020. doi: 10.1145/3422622
|
| [31] |
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., Sydney, Australia, 2017, pp. 214–223.
|
| [32] |
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of Wasserstein GANs,” in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, vol. 30, pp. 5769–5779.
|
| [33] |
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Proc. Int. Conf. Learn. Represent., Vancouver, BC, Canada, 2018, pp. 1–26.
|
| [34] |
J. Zhuang, T. Tang, Y. Ding, S. Tatikonda, N. Dvornek, X. Papademetris, and J. S. Duncan, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18795–18806.
|
| [35] |
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Dec. 1997. doi: 10.1162/neco.1997.9.8.1735
|
| [36] |
M. Liu and M. Shang, “On RNN-based k-WTA models with time-dependent inputs,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 11, pp. 2034–2036, Nov. 2022. doi: 10.1109/JAS.2022.105932
|
| [37] |
Y. Lin, Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu, “BertGCN: Transductive text classification by combining GCN and BERT,” in Proc. ACL., 2021, pp. 1456–1462.
|