LMAdam: Enhancing Adam via Linear Multistep Discretization

Liangming Chen; Longbang Wang; Long Jin; Jun Wang

doi:10.1109/JAS.2025.125834

Volume 13 Issue 1

Jan. 2026

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > 13(1): 161-169

L. Chen, L. Wang, L. Jin, and J. Wang, “LMAdam: Enhancing adam via linear multistep discretization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 1, pp. 161–169, Jan. 2026. doi: 10.1109/JAS.2025.125834

Citation:

L. Chen, L. Wang, L. Jin, and J. Wang, “LMAdam: Enhancing adam via linear multistep discretization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 1, pp. 161–169, Jan. 2026. doi: 10.1109/JAS.2025.125834

Citation:

PDF( 1056 KB)

LMAdam: Enhancing Adam via Linear Multistep Discretization

doi: 10.1109/JAS.2025.125834

Funds: This work was supported in part by the National Natural Science Foundation of China (62506148 and 62476115), the Fundamental Research Funds for the Central Universities (lzujbky-2025-pd05 and lzujbky-2025-ytB01), the Research Grants Council of the Hong Kong Special Administrative Region of China (AoE/E-407/24-N and C1013-24G), the Postdoctoral Fellowship Program (Grade C) of China Postdoctoral Science Foundation (GZC20251039), and the Supercomputing Center of Lanzhou University

More Information

Author Bio:
Liangming Chen (Member, IEEE) received the B.S. degree in physics from Wuhan University in 2017, the M.E degree in computer technology from Lanzhou University in 2021, and the Ph.D. degree in computer software and theory from the University of Chinese Academy of Sciences in 2025. His research interests focus on fundamental issues in deep learning, particularly on architectures and training algorithms

Longbang Wang received the B.S. degree in computer science and technology from Nanjing Normal University in 2022, and the M.E. degree in computer technology from Lanzhou University in 2025. His research interests include optimization and generalization in deep learning

Long Jin (Senior Member, IEEE) received the B.E. degree in automation and the Ph.D. degree in information and communication engineering from Sun Yat-Sen University in 2011 and 2016, respectively. He underwent Post-Doctoral training in the Department of Computing, the Hong Kong Polytechnic University from 2016 to 2017. In 2017, he joined the School of Information Science and Engineering, Lanzhou University (LZU), as a Professor of computer science and engineering. From October 2023 to April 2024, he was a Visiting Professor at the City University of Hong Kong. Currently, with the approval of both LZU and Lanzhou University of Technology (LUT), he is also the Chair Professor at LUT. His current research interests include neural networks, optimization, intelligent computing, and robotics. He serves as an Associate Editor for several journals, such as IEEE Transactions on Industrial Electronics and IEEE/CAA Journal of Automatica Sinica

Jun Wang (Life Fellow, IEEE) received the B.S. degree in electrical engineering, the M.S. degree in systems engineering from Dalian University of Technology, and the Ph.D. degree in systems engineering from Case Western Reserve University, Cleveland, OH, USA, in 1982, 1985, and 1991, respectively. He held various academic positions with Dalian University of Technology, Case Western Reserve University, Cleveland, USA, the University of North Dakota, Grand Forks, USA, and the Chinese University of Hong Kong, China. He also held various short-term or part-time visiting positions with the U.S. Air Force Armstrong Laboratory, Dayton, USA, RIKEN Brain Science Institute, Japan, Chinese Academy of Sciences, Huazhong University of Science and Technology, Shanghai Jiao Tong University, Dalian University of Technology, and Swinburne University of Technology, Melbourne, VIC, Australia. Currently, he is a Chair Professor with the Department of Computer Science and Department of Data Science, City University of Hong Kong, Hong Kong, China. Dr. Wang was a recipient of the 2008−2009 Research Excellence Award from the Chinese University of Hong Kong, the Outstanding Achievement Award from Asia Paciﬁc Neural Network Assembly, the IEEE Transactions on Neural Networks Outstanding Paper Award in 2011, the Neural Networks Pioneer Award from the IEEE Computational Intelligence Society in 2014, the Norbert Wiener Award from the IEEE Systems, Man, and Cybernetics Society in 2019. He is a Fellow of the Hong Kong Academy of Engineering and the International Association for Pattern Recognition, and a foreign member of Academia Europaea. He is the Editor-in-Chief of the IEEE Transactions on Artificial Intelligence and was the Editor-in-Chief of the IEEE Transactions on Cybernetics from 2014 to 2019. He also served on the Editorial Board or Editorial Advisory Board of the IEEE Transactions on Neural Networks, IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, Neural Networks, and International Journal of Neural Systems. He was a Guest Editor of Special Issues of the European Journal of Operational Research, International Journal of Neural Systems, Neurocomputing, Journal of the Franklin Institute, and IEEE Transactions on Systems, Man, and Cybernetics: Systems. He served as the President of the Asia Paciﬁc Neural Network Assembly in 2006, the General Chair of the 2008 IEEE World Congress on Computational Intelligence, 13th/25th International Conference on Neural Information Processing in 2006/2018, and a Program Chair of the IEEE International Conference on Systems, Man, and Cybernetics in 2012. In addition, he has served on many committees, such as the IEEE Fellow Committee, Fellow Evaluation Committee and Awards Committee in IEEE Computational Intelligence Society, and Fellow Evaluation Committee and Board of Governors in IEEE Systems, Man, and Cybernetics Society. He was an IEEE Computational Intelligence Society Distinguished Lecturer from 2010 to 2012 and from 2014 to 2016, and an IEEE Systems, Man, and Cybernetics Society Distinguished Lecturer from 2017 to 2021
Corresponding author: Long Jin, e-mail: jinlong@lzu.edu.cn; Jun Wang, e-mail: jwang.cs@cityu.edu.hk
¹ https://www.cs.toronto.edu/~kriz/cifar.html
² https://www.image-net.org/
³ https://github.com/jiweibo/ImageNet
⁴ https://catalog.ldc.upenn.edu/LDC95T7
⁵ https://www.kaggle.com/datasets/weipengfei/ohr8r52
⁶ https://www.cs.cornell.edu/people/pabo/movie-review-data/
⁷ http://disi.unitn.it/moschitti/corpora.htm
Received Date: 2024-12-11
Revised Date: 2025-04-24
Accepted Date: 2025-08-06

Available Online: 2026-01-22

Abstract

Abstract

In this paper, we propose a learning algorithm termed linear multistep adaptive moment (LMAdam) to enhance the adaptive moment (Adam) algorithm for machine learning. Considering Adam as a single-step discretization of its continuous counterpart, we develop the LMAdam algorithm based on a linear multistep discretization scheme. We design a feedforward neural network for learning the coefficients of the multistep terms with ensured consistency and select the coefficients to ensure zero stability of the multistep terms. We experimentally demonstrate the superiority of the LMAdam via extensive experimentation on benchmark datasets for training various deep neural networks in three applications.
- Adaptive moment (Adam) algorithm,
- continuous dynamic,
- deep learning,
- multistep discretization

FullText(HTML)

¹ https://www.cs.toronto.edu/~kriz/cifar.html
² https://www.image-net.org/
³ https://github.com/jiweibo/ImageNet
⁴ https://catalog.ldc.upenn.edu/LDC95T7
⁵ https://www.kaggle.com/datasets/weipengfei/ohr8r52
⁶ https://www.cs.cornell.edu/people/pabo/movie-review-data/
⁷ http://disi.unitn.it/moschitti/corpora.htm

References(37)

References

[1]	C. Pan, J. Peng, and Z. Zhang, “Depth-guided vision transformer with normalizing flows for monocular 3D object detection,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 3, pp. 673–689, Mar. 2024. doi: 10.1109/JAS.2023.123660
[2]	Y. Zha, Y. Yang, R. Li, and Z. Hu, “Text alignment is an efficient unified model for massive NLP tasks,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, vol. 36, 2024.
[3]	Y. Liu, B. Tian, Y. Lv, L. Li, and F.-Y. Wang, “Point cloud classification using content-based transformer via clustering in feature space,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 2066–2074, Jan. 2024.
[4]	Z. Zhang, Z. Lei, M. Omura, H. Hasegawa, and S. Gao, “Dendritic learning-incorporated vision transformer for image recognition,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 539–541, Feb. 2024. doi: 10.1109/JAS.2023.123978
[5]	W. Xu, C. Zhao, J. Cheng, Y. Wang, Y. Tang, T. Zhang, Z. Yuan, Y. Lv, and F.-Y. Wang, “Transformer-based macroscopic regulation for high-speed railway timetable rescheduling,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1822–1833, Sept. 2023. doi: 10.1109/JAS.2023.123501
[6]	Y. Demidovich, G. Malinovsky, I. Sokolov, and P. Richtárik, “A guide through the zoo of biased SGD,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, vol. 36, 2024.
[7]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., San Diego, CA, USA, May 2015.
[8]	J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, no. 7, pp. 1–39, Feb. 2011.
[9]	T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, May 2012.
[10]	J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Minneapolis, MN, USA, 2019, pp. 4171–4186.
[11]	W. Zeng, “Pangu-α: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation,” 2021. [Online]. Available: arXiv: 2104.12369.
[12]	E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y. Zhou, S. Savarese, and C. Xiong, “CodeGen: An open large language model for code with multi-turn program synthesis,” in Proc. Int. Conf. Learn. Represent., Kigali, Rwanda, 2023.
[13]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” In Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp 5998–6008.
[14]	I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. Learn. Represent., New Orleans, LA, USA, 2019.
[15]	M. Zaheer, S. Reddi, D. Sachan, S. Kale, and S. Kumar, “Adaptive methods for nonconvex optimization,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, QC, Canada, 2018, pp. 9815–9825.
[16]	S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of Adam and beyond,” in Proc. Int. Conf. Learn. Represent., Vancouver, BC, Canada, 2018, pp. 1–23.
[17]	X. Xie, P. Zhou, H. Li, Z. Lin, and S. Yan, “Adan: Adaptive Nesterov momentum algorithm for faster optimizing deep models,” 2022. [Online]. Available: arXiv: 2208.06677.
[18]	D. F. Griffiths and D. J. Higham, Numerical methods for ordinary differential equations: Initial value problems. London, U.K.: Springer, 2010.
[19]	A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A simple convergence proof of Adam and Adagrad,” Trans. Mach. Learn. Res., vol. 2022, no. 1, pp. 1–30, Oct. 2020.
[20]	Z. Xie, X. Wang, H. Zhang, I. Sato, and M. Sugiyama, “Adaptive inertia: Disentangling the effects of adaptive learning rate and momentum,” in Proc. Int. Conf. Mach. Learn., Baltimore, MD, USA, 2022, pp. 24430–24459.
[21]	W. E, “A proposal on machine learning via dynamical systems,” Commun. Math. Stat., vol. 5, no. 1, pp. 1–11, Mar. 2017.
[22]	Y. Lu, A. Zhong, Q. Li, and B. Dong, “Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations,” in Proc. Int. Conf. Mach. Learn., Stockholm, Sweden, July 2018, pp. 3276–3285.
[23]	R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, Canada, vol. 31, 2018.
[24]	L. Chen, L. Jin, and M. Shang, “Zero stability well predicts performance of convolutional neural networks,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 6268–6277.
[25]	K. Atkinson, W. Han, and D. E. Stewart, Numerical solution of ordinary differential equations. Hoboken, NJ, USA: Wiley, 2009.
[26]	Y. Zhang, L. Jin, D. Guo, Y. Yin, and Y. Chou, “Taylor-type 1-step-ahead numerical differentiation rule for first-order derivative approximation and ZNN discretization,” J. Comput. Appl. Math., vol. 273, pp. 29–40, Jan. 2015. doi: 10.1016/j.cam.2014.05.027
[27]	K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, 2016, pp. 770–778.
[28]	K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., San Diego, CA, USA, 2015.
[29]	G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, Hawaii, USA, 2017, pp. 4700–4708.
[30]	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM., vol. 63, no. 11, pp. 139–144, Nov. 2020. doi: 10.1145/3422622
[31]	M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., Sydney, Australia, 2017, pp. 214–223.
[32]	I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of Wasserstein GANs,” in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, vol. 30, pp. 5769–5779.
[33]	T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Proc. Int. Conf. Learn. Represent., Vancouver, BC, Canada, 2018, pp. 1–26.
[34]	J. Zhuang, T. Tang, Y. Ding, S. Tatikonda, N. Dvornek, X. Papademetris, and J. S. Duncan, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18795–18806.
[35]	S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Dec. 1997. doi: 10.1162/neco.1997.9.8.1735
[36]	M. Liu and M. Shang, “On RNN-based k-WTA models with time-dependent inputs,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 11, pp. 2034–2036, Nov. 2022. doi: 10.1109/JAS.2022.105932
[37]	Y. Lin, Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu, “BertGCN: Transductive text classification by combining GCN and BERT,” in Proc. ACL., 2021, pp. 1456–1462.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (32) PDF downloads(4)

LMAdam: Enhancing Adam via Linear Multistep Discretization

doi: 10.1109/JAS.2025.125834

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content