Volume 12
Issue 12
IEEE/CAA Journal of Automatica Sinica
| Citation: | Q. Hu, H. Wu, and X. Luo, “A comprehensive review of parallel optimization algorithms for high-dimensional and incomplete matrix factorization,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2399–2426, Dec. 2025. doi: 10.1109/JAS.2025.125774 |
| [1] |
X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and P. S. Yu, “A survey on heterogeneous graph embedding: Methods, techniques, applications and sources,” IEEE Trans. Big Data, vol. 9, no. 2, pp. 415–436, Apr. 2023. doi: 10.1109/TBDATA.2022.3177455
|
| [2] |
J. Lü, G. Wen, R. Lu, Y. Wang, and S. Zhang, “Networked knowledge and complex networks: An engineering view,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1366–1383, Aug. 2022. doi: 10.1109/JAS.2022.105737
|
| [3] |
X. Luo, L. Wang, P. Hu, and L. Hu, “Predicting protein-protein interactions using sequence and network information via variational graph autoencoder,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 20, no. 5, pp. 3182–3194, Sep.-Oct. 2023. doi: 10.1109/TCBB.2023.3273567
|
| [4] |
L. Hu, X. Wang, Y.-A. Huang, P. Hu, and Z.-H. You, “A survey on computational models for predicting protein-protein interactions,” Brief. Bioinform., vol. 22, no. 5, p. bbab036, Sep. 2021. doi: 10.1093/bib/bbab036
|
| [5] |
H. Liu, C. Zheng, D. Li, X. Shen, K. Lin, J. Wang, Z. Zhang, Z. Zhang, and N. N. Xiong, “EDMF: Efficient deep matrix factorization with review feature learning for industrial recommender system,” IEEE Trans. Ind. Inf., vol. 18, no. 7, pp. 4361–4371, Jul. 2022. doi: 10.1109/TII.2021.3128240
|
| [6] |
X. Luo, W. Qin, A. Dong, K. Sedraoui, and M. C. Zhou, “Efficient and high-quality recommendations via momentum-incorporated parallel stochastic gradient descent-based learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 402–411, Feb. 2021. doi: 10.1109/JAS.2020.1003396
|
| [7] |
G. Zou, J. Chen, Q. He, K.-C. Li, B. Zhang, and Y. Gan, “NDMF: Neighborhood-integrated deep matrix factorization for service QoS prediction,” IEEE Trans. Netw. Serv. Manage., vol. 17, no. 4, pp. 2717–2730, Dec. 2020. doi: 10.1109/TNSM.2020.3027185
|
| [8] |
D. Roy and M. Dutta, “A systematic review and research perspective on recommender systems,” J. Big Data, vol. 9, no. 1, p. 59, May 2022. doi: 10.1186/s40537-022-00592-5
|
| [9] |
Y. Yuan, Q. He, X. Luo, and M. Shang, “A multilayered-and-randomized latent factor model for high-dimensional and sparse matrices,” IEEE Trans. Big Data, vol. 8, no. 3, pp. 784–794, Jun. 2022. doi: 10.1109/TBDATA.2020.2988778
|
| [10] |
K. Berahmand, M. Mohammadi, F. Saberi-Movahed, Y. Li, and Y. Xu, “Graph regularized nonnegative matrix factorization for community detection in attributed networks,” IEEE Trans. Netw. Sci. Eng., vol. 10, no. 1, pp. 372–385, Jan.-Feb. 2023. doi: 10.1109/TNSE.2022.3210233
|
| [11] |
W. Qin, X. Luo, and M. C. Zhou, “Adaptively-accelerated parallel stochastic gradient descent for high-dimensional and incomplete data representation learning,” IEEE Trans. Big Data, vol. 10, no. 1, pp. 92–107, Feb. 2024. doi: 10.1109/TBDATA.2023.3326304
|
| [12] |
H. Li, K. Li, J. An, and K. Li, “An online and scalable model for generalized sparse nonnegative matrix factorization in industrial applications on multi-GPU,” IEEE Trans. Ind. Inf., vol. 18, no. 1, pp. 437–447, Jan. 2022. doi: 10.1109/TII.2019.2896634
|
| [13] |
Z. Chen and S. Wang, “A review on matrix completion for recommender systems,” Knowl. Inf. Syst., vol. 64, pp. 1–34, Jan. 2022. doi: 10.1007/s10115-021-01629-6
|
| [14] |
Z. Cheng, C. Yan, F.-X. Wu, and J. Wang, “Drug-target interaction prediction using multi-head self-attention and graph attention network,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 4, pp. 2208–2218, Jul.-Aug. 2022. doi: 10.1109/TCBB.2021.3077905
|
| [15] |
Y. Li, W. Liang, L. Peng, D. Zhang, C. Yang, and K.-C. Li, “Predicting drug-target interactions via dual-stream graph neural network,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 21, no. 4, pp. 948–958, Jul.-Aug. 2024. doi: 10.1109/TCBB.2022.3204188
|
| [16] |
M. Shang, Y. Yuan, X. Luo, and M. C. Zhou, “An α–β-divergence-generalized recommender for highly accurate predictions of missing user preferences,” IEEE Trans. Cybern., vol. 52, no. 8, pp. 8006–8018, Aug. 2022. doi: 10.1109/TCYB.2020.3026425
|
| [17] |
T. Huang, C. Liang, D. Wu, and Y. He, “A debiasing autoencoder for recommender system,” IEEE Trans. Consum. Electron., vol. 70, no. 1, pp. 3603–3613, Feb. 2024. doi: 10.1109/TCE.2023.3281521
|
| [18] |
Z. Zheng, X. Li, M. Tang, F. Xie, and M. R. Lyu, “Web service QoS prediction via collaborative filtering: A survey,” IEEE Trans. Serv. Comput., vol. 15, no. 4, pp. 2455–2472, Jul.-Aug. 2022. doi: 10.1109/TSC.2020.2995571
|
| [19] |
D. Wu, P. Zhang, Y. He, and X. Luo, “A double-space and double-norm ensembled latent factor model for highly accurate web service QoS prediction,” IEEE Trans. Serv. Comput., vol. 16, no. 2, pp. 802–814, Mar.-Apr. 2023. doi: 10.1109/TSC.2022.3178543
|
| [20] |
S. H. Ghafouri, S. M. Hashemi, and P. C. K. Hung, “A survey on web service QoS prediction methods,” IEEE Trans. Serv. Comput., vol. 15, no. 4, pp. 2439–2454, Jul.-Aug. 2022. doi: 10.1109/TSC.2020.2980793
|
| [21] |
P. Wu, M. Pei, T. Wang, Y. Liu, Z. Liu, and L. Zhong, “A low-rank Bayesian temporal matrix factorization for the transfer time prediction between metro and bus systems,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 7, pp. 7206–7222, Jul. 2024. doi: 10.1109/TITS.2023.3349211
|
| [22] |
J. Li, H. Wu, J. Chen, Q. He, and C.-H. Hsu, “Topology-aware neural model for highly accurate QoS prediction,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 7, pp. 1538–1552, Jul. 2022. doi: 10.1109/TPDS.2021.3116865
|
| [23] |
M. Li, Y. Song, D. Ding, and R. Sun, “Triple factorization-based SNLF representation with improved momentum-incorporated AGD: A knowledge transfer approach,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 12, pp. 9448–9463, Dec. 2024. doi: 10.1109/TKDE.2024.3450469
|
| [24] |
J. Deng, X. Ran, Y. Wang, L. Y. Zhang, and J. Guo, “Probabilistic matrix factorization recommendation approach for integrating multiple information sources,” IEEE Trans. Syst. Man Cybern. Syst., vol. 53, no. 10, pp. 6220–6231, Oct. 2023. doi: 10.1109/TSMC.2023.3281706
|
| [25] |
H. Huang, G. Zhou, N. Liang, Q. Zhao, and S. Xie, “Diverse deep matrix factorization with hypergraph regularization for multi-view data representation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 11, pp. 2154–2167, Nov. 2023. doi: 10.1109/JAS.2022.105980
|
| [26] |
M. Chen, M. Gong, and X. Li, “Feature weighted non-negative matrix factorization,” IEEE Trans. Cybern., vol. 53, no. 2, pp. 1093–1105, Feb. 2023. doi: 10.1109/TCYB.2021.3100067
|
| [27] |
X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proc. 26th Int. Conf. World Wide Web, Perth, Australia, 2017, pp. 173-182.
|
| [28] |
R. Duan, C. Jiang, and H. K. Jain, “Combining review-based collaborative filtering and matrix factorization: A solution to rating’s sparsity problem,” Decis. Support Syst., vol. 156, p. 113748, May 2022. doi: 10.1016/j.dss.2022.113748
|
| [29] |
M. Li and Y. Song, “An improved non-negative latent factor model for missing data estimation via extragradient-based alternating direction method,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 5640–5653, Sep. 2023. doi: 10.1109/TNNLS.2021.3130289
|
| [30] |
H. Wu, X. Luo, and M. C. Zhou, “Advancing non-negative latent factorization of tensors with diversified regularization schemes,” IEEE Trans. Serv. Comput., vol. 15, no. 3, pp. 1334–1344, May-Jun. 2022. doi: 10.1109/TSC.2020.2988760
|
| [31] |
Z. Cheng, Y. Ding, L. Zhu, and M. Kankanhalli, “Aspect-aware latent factor model: Rating prediction with ratings and reviews,” in Proc. World Wide Web Conf., Lyon, France, 2018, pp. 639-648.
|
| [32] |
D. Wu, Y. He, X. Luo, and M. C. Zhou, “A latent factor analysis-based approach to online sparse streaming feature selection,” IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 11, pp. 6744–6758, Nov. 2022. doi: 10.1109/TSMC.2021.3096065
|
| [33] |
Z. Cao, Y. Zhang, J. Guan, S. Zhou, and G. Chen, “Link weight prediction using weight perturbation and latent factor,” IEEE Trans. Cybern., vol. 52, no. 3, pp. 1785–1797, Mar. 2022. doi: 10.1109/TCYB.2020.2995595
|
| [34] |
X. Luo, Y. Zhong, Z. Wang, and M. Li, “An alternating-direction-method of multipliers-incorporated approach to symmetric non-negative latent factor analysis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 4826–4840, Aug. 2023. doi: 10.1109/TNNLS.2021.3125774
|
| [35] |
H. Li, K. Li, J. An, W. Zheng, and K. Li, “An efficient manifold regularized sparse non-negative matrix factorization model for large-scale recommender systems on GPUs,” Inf. Sci., vol. 496, pp. 464–484, Sep. 2019. doi: 10.1016/j.ins.2018.07.060
|
| [36] |
D. Wu, M. Shang, X. Luo, and Z. Wang, “An L1-and-L2-norm-oriented latent factor model for recommender systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5775–5788, Oct. 2022. doi: 10.1109/TNNLS.2021.3071392
|
| [37] |
A. Vandaele, N. Gillis, Q. Lei, K. Zhong, and I. Dhillon, “Efficient and non-convex coordinate descent for symmetric nonnegative matrix factorization,” IEEE Trans. Signal Process., vol. 64, no. 21, pp. 5571–5584, Nov. 2016. doi: 10.1109/TSP.2016.2591510
|
| [38] |
Y. Ding, J. Tang, and F. Guo, “Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion,” Knowl. Based Syst., vol. 204, p. 106254, Sep. 2020. doi: 10.1016/j.knosys.2020.106254
|
| [39] |
S. Zhou, R. Kannan, and V. K. Prasanna, “Accelerating stochastic gradient descent based matrix factorization on FPGA,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 8, pp. 1897–1911, Aug. 2020. doi: 10.1109/TPDS.2020.2974744
|
| [40] |
X. Luo, Y. Yuan, S. Chen, N. Zeng, and Z. Wang, “Position-transitional particle swarm optimization-incorporated latent factor analysis,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 8, pp. 3958–3970, Aug. 2022. doi: 10.1109/TKDE.2020.3033324
|
| [41] |
L. Hou, D. Chu, and L.-Z. Liao, “A progressive hierarchical alternating least squares method for symmetric nonnegative matrix factorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5355–5369, May 2023.
|
| [42] |
R. M. Bell and Y. Koren, “Scalable collaborative filtering with jointly derived neighborhood interpolation weights,” in Proc. 7th IEEE Int. Conf. Data Mining, Omaha, USA, 2007, pp. 43-52.
|
| [43] |
I. Pilászy, D. Zibriczky, and D. Tikk, “Fast ALS-based matrix factorization for explicit and implicit feedback datasets,” in Proc. 4th ACM Conf. Recommender Syst., Barcelona, Spain, 2010, pp. 71-78.
|
| [44] |
H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon, “Scalable coordinate descent approaches to parallel matrix factorization for recommender systems,” in Proc. 12th IEEE Int. Conf. Data Min., Brussels, Belgium, 2012, pp. 765-774.
|
| [45] |
I. Nisa, A. Sukumaran-Rajam, R. Kunchum, and P. Sadayappan, “Parallel CCD++ on GPU for matrix factorization,” in Proc. 10th Gener. Purpose GPUs, Austin, USA, 2017, pp. 73-83.
|
| [46] |
K. Xie, Y. Chen, X. Wang, G. Xie, J. Cao, and J. Wen, “Accurate and fast recovery of network monitoring data: A GPU accelerated matrix completion,” IEEE/ACM Trans. Netw., vol. 28, no. 3, pp. 958–971, Jun. 2020. doi: 10.1109/TNET.2020.2976129
|
| [47] |
F. Bi, T. He, and X. Luo, “A fast nonnegative autoencoder-based approach to latent feature analysis on high-dimensional and incomplete data,” IEEE Trans. Serv. Comput., vol. 17, no. 3, pp. 733–746, May-Jun. 2024. doi: 10.1109/TSC.2023.3319713
|
| [48] |
C. Leng, H. Zhang, G. Cai, I. Cheng, and A. Basu, “Graph regularized Lp smooth non-negative matrix factorization for data representation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 584–595, Mar. 2019. doi: 10.1109/JAS.2019.1911417
|
| [49] |
D. Wu, P. Zhang, Y. He, and X. Luo, “MMLF: Multi-metric latent feature analysis for high-dimensional and incomplete data,” IEEE Trans. Serv. Comput., vol. 17, no. 2, pp. 575–588, Mar.-Apr. 2024. doi: 10.1109/TSC.2023.3331570
|
| [50] |
K. Xie, X. Dong, Y. Zhang, X. Zhang, Q. Guo, and S. Wang, “Learning-based attribute-augmented proximity matrix factorization for attributed network embedding,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 11, pp. 6517–6531, Nov. 2024. doi: 10.1109/TKDE.2024.3385847
|
| [51] |
S. Gulcan, M. M. Ozdal, and C. Aykanat, “Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filtering,” Future Gener. Comput. Syst., vol. 146, pp. 207–221, Sep. 2023. doi: 10.1016/j.future.2023.04.007
|
| [52] |
X. Xie, W. Tan, L. L. Fong, and Y. Liang, “CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs,” in Proc. 26th Int. Symp. High-Performance Parallel and Distributed Computing, Washington, USA, 2017, pp. 79-92.
|
| [53] |
X. Luo, D. Wang, M. C. Zhou, and H. Yuan, “Latent factor-based recommenders relying on extended stochastic gradient descent algorithms,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 2, pp. 916–926, Feb. 2021. doi: 10.1109/TSMC.2018.2884191
|
| [54] |
X. Luo, Z. Liu, S. Li, M. Shang, and Z. Wang, “A fast non-negative latent factor model based on generalized momentum method,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 1, pp. 610–620, Jan. 2021. doi: 10.1109/TSMC.2018.2875452
|
| [55] |
W. Li, R. Wang, and X. Luo, “A generalized Nesterov-accelerated second-order latent factor model for high-dimensional and incomplete data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 1, pp. 1518–1532, Jan. 2025. doi: 10.1109/TNNLS.2023.3321915
|
| [56] |
N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Netw., vol. 12, no. 1, pp. 145–151, Jan. 1999. doi: 10.1016/S0893-6080(98)00116-6
|
| [57] |
X. Luo, Y. Zhou, Z. Liu, L. Hu, and M. C. Zhou, “Generalized Nesterov’s acceleration-incorporated, non-negative and adaptive latent factor analysis,” IEEE Trans. Serv. Comput., vol. 15, no. 5, pp. 2809–2823, Sep.-Oct. 2022. doi: 10.1109/TSC.2021.3069108
|
| [58] |
G. Qu and N. Li, “Accelerated distributed Nesterov gradient descent,” IEEE Trans. Autom. Control, vol. 65, no. 6, pp. 2566–2581, Jun. 2020. doi: 10.1109/TAC.2019.2937496
|
| [59] |
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learning Representations, Ithaca, USA, 2015.
|
| [60] |
K. Büyükkaya, M. O. Karsavuran, and C. Aykanat, “Stochastic gradient descent for matrix completion: Hybrid parallelization on shared- and distributed-memory systems,” Knowl. Based Syst., vol. 283, p. 111176, Jan. 2024. doi: 10.1016/j.knosys.2023.111176
|
| [61] |
N. Abubaker, O. Caglayan, M. O. Karsavuran, and C. Aykanat, “Minimizing staleness and communication overhead in distributed SGD for collaborative filtering,” IEEE Trans. Comput., vol. 72, no. 10, pp. 2925–2937, Oct. 2023. doi: 10.1109/TC.2023.3275107
|
| [62] |
W. Qin, X. Luo, S. Li, and M. C. Zhou, “Parallel adaptive stochastic gradient descent algorithms for latent factor analysis of high-dimensional and incomplete industrial data,” IEEE Trans. Automat. Sci. Eng., vol. 21, no. 3, pp. 2716–2729, Jul. 2024. doi: 10.1109/TASE.2023.3267609
|
| [63] |
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A fast parallel stochastic gradient method for matrix factorization in shared memory systems,” ACM Trans. Intell. Syst. Technol., vol. 6, no. 1, p. 2, Apr. 2015.
|
| [64] |
Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin, “A fast parallel SGD for matrix factorization in shared memory systems,” in Proc. 7th ACM Conf. Recommender Systems, Hong Kong, China, 2013, pp. 249-256.
|
| [65] |
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, “Large-scale matrix factorization with distributed stochastic gradient descent,” in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Diego, USA, 2011, pp. 69-77.
|
| [66] |
X. Luo, H. Liu, G. Gou, Y. Xia, and Q. Zhu, “A parallel matrix factorization based recommender by alternating stochastic gradient decent,” Eng. Appl. Artif. Intell., vol. 25, no. 7, pp. 1403–1412, Oct. 2012. doi: 10.1016/j.engappai.2011.10.011
|
| [67] |
H. Li, K. Li, J. An, and K. Li, “MSGD: A novel matrix factorization approach for large-scale collaborative filtering recommender systems on GPUs,” IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 7, pp. 1530–1544, Jul. 2018. doi: 10.1109/TPDS.2017.2718515
|
| [68] |
F. Elahi, M. Fazlali, H. T. Malazi, and M. Elahi, “Parallel fractional stochastic gradient descent with adaptive learning for recommender systems,” IEEE Trans. Parallel Distrib. Syst., vol. 35, no. 3, pp. 470–483, Mar. 2024. doi: 10.1109/TPDS.2022.3185212
|
| [69] |
C. Teflioudi, F. Makari, and R. Gemulla, “Distributed matrix completion,” in Proc. IEEE 12th Int. Conf. Data Mining, Brussels, Belgium, 2012, pp. 655-664.
|
| [70] |
F. Makari, C. Teflioudi, R. Gemulla, P. Haas, and Y. Sismanis, “Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion,” Knowl. Inf. Syst., vol. 42, no. 3, pp. 493–523, Mar. 2015. doi: 10.1007/s10115-013-0718-7
|
| [71] |
W.-S. Chen, K. Xie, R. Liu, and B. Pan, “Symmetric nonnegative matrix factorization: A systematic review,” Neurocomputing, vol. 557, p. 126721, Nov. 2023. doi: 10.1016/j.neucom.2023.126721
|
| [72] |
F. Saberi-Movahed, K. Berahman, R. Sheikhpour, Y. Li, and S. Pan, “Nonnegative matrix factorization in dimensionality reduction: A survey,” ACM Comput. Surv., vol. 58, no. 5, p. 118, Apr. 2026.
|
| [73] |
P. De Handschutter, N. Gillis, and X. Siebert, “A survey on deep matrix factorizations,” Comput. Sci. Rev., vol. 42, p. 100423, Nov. 2021. doi: 10.1016/j.cosrev.2021.100423
|
| [74] |
Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,” IEEE Trans. Signal Process., vol. 67, no. 20, pp. 5239–5269, Oct. 2019. doi: 10.1109/TSP.2019.2937282
|
| [75] |
H. Salehi, A. Gorodetsky, R. Solhmirzaei, and P. Jiao, “High-dimensional data analytics in civil engineering: A review on matrix and tensor decomposition,” Eng. Appl. Artif. Intell., vol. 125, p. 106659, Oct. 2023. doi: 10.1016/j.engappai.2023.106659
|
| [76] |
K. Berahmand, M. Mohammadi, R. Sheikhpour, Y. Li, and Y. Xu, “WSNMF: Weighted symmetric nonnegative matrix factorization for attributed graph clustering,” Neurocomputing, vol. 566, p. 127041, Jan. 2024. doi: 10.1016/j.neucom.2023.127041
|
| [77] |
C. He, X. Fei, Q. Cheng, H. Li, Z. Hu, and Y. Tang, “A survey of community detection in complex networks using nonnegative matrix factorization,” IEEE Trans. Comput. Soc. Syst., vol. 9, no. 2, pp. 440–457, Apr. 2022. doi: 10.1109/TCSS.2021.3114419
|
| [78] |
W. Tan, L. Cao, and L. Fong, “Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs,” in Proc. 25th ACM Int. Symp. High-Performance Parallel and Distributed Computing, Kyoto, Japan, 2016, pp. 219-230.
|
| [79] |
W. Tan, S. Chang, L. Fong, C. Li, Z. Wang, and L. Cao, “Matrix factorization on GPUs with memory optimization and approximate computing,” in Proc. 47th Int. Conf. Parallel Processing, Eugene, USA, 2018, pp. 26.
|
| [80] |
Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.
|
| [81] |
H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon, “Parallel matrix factorization for recommender systems,” Knowl. Inf. Syst., vol. 41, no. 3, pp. 793–819, Dec. 2014. doi: 10.1007/s10115-013-0682-2
|
| [82] |
Y. Nishioka and K. Taura, “Scalable task-parallel SGD on matrix factorization in multicore architectures,” in Proc. 29th IEEE Int. Parallel and Distributed Processing Symp. Workshop, Hyderabad, India, 2015, pp. 1178-1184.
|
| [83] |
F. Niu, B. Recht, C. Re, and S. J. Wright, “HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent,” in Proc. 25th Int. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 693-701.
|
| [84] |
W. Qin and X. Luo, “Asynchronous parallel fuzzy stochastic gradient descent for high-dimensional incomplete data representation,” IEEE Trans. Fuzzy Syst., vol. 32, no. 2, pp. 445–459, Feb. 2024. doi: 10.1109/TFUZZ.2023.3300370
|
| [85] |
W. Qin and X. Luo, “An asynchronously alternative stochastic gradient descent algorithm for efficiently parallel latent feature analysis on shared-memory,” in Proc. 13th IEEE Int. Conf. Knowledge Graph, Orlando, USA, 2022, pp. 217-224.
|
| [86] |
F. Petroni and L. Querzoni, “GASGD: Stochastic gradient descent for distributed asynchronous matrix completion via graph partitioning,” in Proc. 8th ACM Conf. Recommender Systems, Foster City, USA, 2014, pp. 241-248.
|
| [87] |
R. Guo, F. Zhang, L. Wang, W. Zhang, X. Lei, R. Ranjan, and A. Y. Zomaya, “BaPa: A novel approach of improving load balance in parallel matrix factorization for recommender systems,” IEEE Trans. Comput., vol. 70, no. 5, pp. 789–802, May 2021. doi: 10.1109/TC.2020.2997051
|
| [88] |
Z.-Q. Yu, X.-J. Shi, L. Yan, and W.-J. Li, “Distributed stochastic ADMM for matrix factorization,” in Proc. 23rd ACM Int. Conf. Conf. Information and Knowledge Management, Shanghai, China, 2014, pp. 1259-1268.
|
| [89] |
F. Zhang, E. Xue, R. Guo, G. Qu, G. Zhao, and A. Y. Zomaya, “DS-ADMM++: A novel distributed quantized ADMM to speed up differentially private matrix factorization,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 6, pp. 1289–1302, Jun. 2022. doi: 10.1109/TPDS.2021.3110104
|
| [90] |
H. Yun, H.-F. Yu, C.-J. Hsieh, S. V. N. Vishwanathan, and I. Dhillon, “NOMAD: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion,” Proc. VLDB Endowment, vol. 7, no. 11, pp. 975–986, Jul. 2014. doi: 10.14778/2732967.2732973
|
| [91] |
B. Li, S. Tata, and Y. Sismanis, “Sparkler: Supporting large-scale matrix factorization,” in Proc. 16th Int. Conf. Extending Database Technology, Genoa, Italy, 2013, pp. 625-636.
|
| [92] |
F. Li, B. Wu, L. Xu, C. Shi, and J. Shi, “A fast distributed stochastic gradient descent algorithm for matrix factorization,” in Proc. 3rd Int. Conf. Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, New York, USA, 2014, pp. 77-87.
|
| [93] |
X. Shi, Q. He, X. Luo, Y. Bai, and M. Shang, “Large-scale and scalable latent factor analysis via distributed alternative stochastic gradient descent for recommender systems,” IEEE Trans. Big Data, vol. 8, no. 2, pp. 420–431, Apr. 2022.
|
| [94] |
J. Jin, S. Lai, S. Hu, J. Lin, and X. Lin, “GPUSGD: A GPU-accelerated stochastic gradient descent algorithm for matrix factorization,” Concurrency Computat. Pract. Exp., vol. 28, no. 14, pp. 3844–3865, Sep. 2016. doi: 10.1002/cpe.3722
|
| [95] |
J. Chen, J. Fang, W. Liu, T. Tang, and C. Yang, “clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization,” Future Gener. Comput. Syst., vol. 108, pp. 1192–1205, Jul. 2020. doi: 10.1016/j.future.2018.04.071
|
| [96] |
J. Chen, J. Fang, W. Liu, and C. Yang, “BALS: Blocked alternating least squares for parallel sparse matrix factorization on GPUs,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 9, pp. 2291–2302, Sep. 2021. doi: 10.1109/TPDS.2021.3064942
|
| [97] |
Y. Yu, D. Wen, Y. Zhang, X. Wang, W. Zhang, and X. Lin, “Efficient matrix factorization on heterogeneous CPU-GPU systems,” in Proc. 37th IEEE Int. Conf. Data Engineering, Chania, Greece, 2021, pp. 1871-1876.
|
| [98] |
Y. Huang, Y. Yin, Y. Liu, S. He, Y. Bai, and R. Li, “A novel multi-CPU/GPU collaborative computing framework for SGD-based matrix factorization,” in Proc. 50th Int. Conf. Parallel Processing, Lemont, USA, 2021, pp. 76.
|
| [99] |
Y. Huang, Y. Liu, Y. Bai, S. Chen, and R. Li, “UMA-MF: A unified multi-CPU/GPU asynchronous computing framework for SGD-based matrix factorization,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 11, pp. 2978–2993, Nov. 2023. doi: 10.1109/TPDS.2023.3317535
|
| [100] |
D. Lee, J. Oh, and H. Yu, “OCAM: Out-of-core coordinate descent algorithm for matrix completion,” Inf. Sci., vol. 514, pp. 587–604, Apr. 2020. doi: 10.1016/j.ins.2019.09.077
|
| [101] |
J. Oh, W.-S. Han, H. Yu, and X. Jiang, “Fast and robust parallel SGD matrix factorization,” in Proc. 21st ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 865-874.
|
| [102] |
D. Lee, J. Oh, C. Faloutsos, B. Kim, and H. Yu, “Disk-based matrix completion for memory limited devices,” in Proc. 27th ACM Int. Conf. Information and Knowledge Management, Torino, Italy, 2018, pp. 1093-1102.
|
| [103] |
H. Mehta, S. Rendle, W. Krichene, and L. Zhang, “ALX: Large scale matrix factorization on TPUs,” arXiv preprint arXiv: 2112.02194, 2021.
|
| [104] |
S. Zhou, R. Kannan, Y. Min, and V. K. Prasanna, “FASTCF: FPGA-based accelerator for stochastic-gradient-descent-based collaborative filtering,” in Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Monterey, USA, 2018, pp. 259-268.
|
| [105] |
S. M. R. Shahshahani and H. R. Mahdiani, “A high-performance scalable shared-memory SVD processor architecture based on Jacobi algorithm and Batcher’s sorting network,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 67, no. 6, pp. 1912–1924, Jun. 2020. doi: 10.1109/TCSI.2020.2973249
|
| [106] |
S. Huang, M. Wang, and Y. Cui, “Traffic-aware buffer management in shared memory switches,” IEEE/ACM Trans. Netw., vol. 30, no. 6, pp. 2559–2573, Dec. 2022. doi: 10.1109/TNET.2022.3173930
|
| [107] |
Y. A. Chen and Y. C. Chung, “An unequal caching strategy for shared-memory graph analytics,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 3, pp. 955–967, Mar. 2023. doi: 10.1109/TPDS.2022.3218885
|
| [108] |
T. M. Shami, A. A. El-Saleh, M. Alswaitti, Q. Al-Tashi, M. A. Summakieh, and S. Mirjalili, “Particle swarm optimization: A comprehensive survey,” IEEE Access, vol. 10, pp. 10031–10061, Jan. 2022. doi: 10.1109/ACCESS.2022.3142859
|
| [109] |
A. T. Nguyen, T. Taniguchi, L. Eciolaza, V. Campos, R. Palhares, and M. Sugeno, “Fuzzy control systems: Past, present and future,” IEEE Comput. Intell. Mag., vol. 14, no. 1, pp. 56–68, Feb. 2019. doi: 10.1109/MCI.2018.2881644
|
| [110] |
T. D. Dang, D. Hoang, and D. N. Nguyen, “Trust-based scheduling framework for big data processing with MapReduce,” IEEE Trans. Serv. Comput., vol. 15, no. 1, pp. 279–293, Jan.-Feb. 2022. doi: 10.1109/TSC.2019.2938959
|
| [111] |
T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed ADMM for large-scale optimization—part I: Algorithm and convergence analysis,” IEEE Trans. Signal Process., vol. 64, no. 12, pp. 3118–3130, Jun. 2016. doi: 10.1109/TSP.2016.2537271
|
| [112] |
F. Bi, X. Luo, B. Shen, H. Dong, and Z. Wang, “Proximal alternating-direction-method-of-multipliers-incorporated nonnegative latent factor analysis,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 6, pp. 1388–1406, Jun. 2023. doi: 10.1109/JAS.2023.123474
|
| [113] |
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, J. Ghodsi, S. Gonzalez, S. Shenker, and I. Stoica, “Apache spark: A unified engine for big data processing,” Commun. ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016. doi: 10.1145/2934664
|
| [114] |
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar, “MLlib: Machine learning in apache spark,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 1235–1241, Jan. 2016.
|
| [115] |
A. Mostafaeipour, A. J. Rafsanjani, M. Ahmadi, and J. A. Dhanraj, “Investigating the performance of Hadoop and Spark platforms on machine learning algorithms,” J. Supercomput., vol. 77, no. 2, pp. 1273–1300, Feb. 2021. doi: 10.1007/s11227-020-03328-5
|
| [116] |
M. Grossman, M. Breternitz, and V. Sarkar, “HadoopCL2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 3, pp. 762–775, Mar. 2016. doi: 10.1109/TPDS.2015.2414943
|
| [117] |
M. Khan, Y. Jin, M. Li, Y. Xiang, and C. Jiang, “Hadoop performance modeling for job estimation and resource provisioning,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 2, pp. 441–454, Feb. 2016. doi: 10.1109/TPDS.2015.2405552
|
| [118] |
B. Fitzpatrick, “Distributed caching with memcached,” Linux J., vol. 2004, no. 124, p. 5, Aug. 2004.
|
| [119] |
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free coordination for internet-scale systems,” in Proc. USENIX Conf. USENIX Annual Technical Conf., Boston, USA, 2010, pp. 1-14.
|
| [120] |
S. M. Nabavinejad, S. Reda, and M. Ebrahimi, “Coordinated batching and DVFS for DNN inference on GPU accelerators,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 10, pp. 2496–2508, Oct. 2022. doi: 10.1109/TPDS.2022.3144614
|
| [121] |
J. Kang, B. Khaleghi, T. Rosing, and Y. Kim, “OpenHD: A GPU-powered framework for hyperdimensional computing,” IEEE Trans. Comput., vol. 71, no. 11, pp. 2753–2765, Nov. 2022. doi: 10.1109/TC.2022.3179226
|
| [122] |
J. Choquette, W. Gandhi, O. Giroux, N. Stam, and R. Krashinsky, “NVIDIA A100 tensor core GPU: Performance and innovation,” IEEE Micro, vol. 41, no. 2, pp. 29–35, Apr. 2021. doi: 10.1109/MM.2021.3061394
|
| [123] |
J. Burgess, “RTX on—the NVIDIA Turing GPU,” IEEE Micro, vol. 40, no. 2, pp. 36–44, Mar.-Apr. 2020. doi: 10.1109/MM.2020.2971677
|
| [124] |
U. Borštnik, J. VandeVondele, V. Weber, and J. Hutter, “Sparse matrix multiplication: The distributed block-compressed sparse row library,” Parallel Comput., vol. 40, no. 5-6, pp. 47–58, May 2014. doi: 10.1016/j.parco.2014.03.012
|
| [125] |
Y. Chen, K. Li, W. Yang, G. Xiao, X. Xie, and T. Li, “Performance-aware model for sparse matrix-matrix multiplication on the Sunway TaihuLight supercomputer,” IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 4, pp. 923–938, Apr. 2019. doi: 10.1109/TPDS.2018.2871189
|
| [126] |
A. V. Rodrigues, A. Jorge, and I. Dutra, “Accelerating recommender systems using GPUs,” in Proc. 30th Annu. ACM Symp. Applied Computing, Salamanca, Spain, 2015, pp. 879-884.
|
| [127] |
G. Tan, C. Shui, Y. Wang, X. Yu, and Y. Yan, “Optimizing the LINPACK algorithm for large-scale PCIe-based CPU-GPU heterogeneous systems,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 9, pp. 2367–2380, Sep. 2021. doi: 10.1109/TPDS.2021.3067731
|
| [128] |
A. Li, S. L. Song, J. Chen, J. Li, X. Liu, N. R. Tallent, and K. J. Barker, “Evaluating modern GPU interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 1, pp. 94–110, Jan. 2020. doi: 10.1109/TPDS.2019.2928289
|
| [129] |
D. D. Almeida, L. Bragança, F. S. Torres, R. Ferreira, and J. A. M. Nacif, “HAMBug: A hybrid CPU-FPGA system to detect race conditions,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 68, no. 9, pp. 3158–3162, Sep. 2021.
|
| [130] |
W. Yang, Y. Chen, Z. Huang, H. Zhang, and H. Gu, “Routing and wavelength assignment for multiple multicasts in optical network-on-chip (ONoC),” IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 42, no. 12, pp. 4934–4947, Dec. 2023. doi: 10.1109/TCAD.2023.3274951
|
| [131] |
D. Li, C. Chen, Q. Lv, H. Gu, T. Lu, L. Shang, N. Gu, and S. M. Chu, “AdaError: An adaptive learning rate method for matrix approximation-based collaborative filtering,” in Proc. World Wide Web Conf., Lyon, France, 2018, pp. 741-751.
|
| [132] |
Y. Yuan, D. H. K. Tsang, and V. K. N. Lau, “Combining conjugate gradient and momentum for unconstrained stochastic optimization with applications to machine learning,” IEEE Internet Things J., vol. 11, no. 13, pp. 23236–23254, Jul. 2024. doi: 10.1109/JIOT.2024.3376821
|
| [133] |
G. Zhou, R. Bo, L. Chien, X. Zhang, F. Shi, C. Xu, and Y. Feng, “GPU-based batch LU-factorization solver for concurrent analysis of massive power flows,” IEEE Trans. Power Syst., vol. 32, no. 6, pp. 4975–4977, Nov. 2017. doi: 10.1109/TPWRS.2017.2662322
|
| [134] |
C. Zhang, F. Zhang, X. Guo, B. He, X. Zhang, and X. Du, “iMLBench: A machine learning benchmark suite for CPU-GPU integrated architectures,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 7, pp. 1740–1752, Jul. 2021. doi: 10.1109/TPDS.2020.3046870
|
| [135] |
Z. Guo, T. W. Huang, and Y. Lin, “Accelerating static timing analysis using CPU-GPU heterogeneous parallelism,” IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 42, no. 12, pp. 4973–4984, Dec. 2023. doi: 10.1109/TCAD.2023.3286261
|
| [136] |
Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,” IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7947–7962, Dec. 2021. doi: 10.1109/TWC.2021.3088910
|
| [137] |
M. E. Elbtity, P. S. Chandarana, B. Reidy, J. K. Eshraghian, and R. Zand, “APTPU: Approximate computing based tensor processing unit,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 69, no. 12, pp. 5135–5146, Dec. 2022. doi: 10.1109/TCSI.2022.3206262
|
| [138] |
X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806
|
| [139] |
M. Cui, L. Li, M. Zhou, and A. Abusorrah, “Surrogate-assisted autoencoder-embedded evolutionary optimization algorithm to solve high-dimensional expensive problems,” IEEE Trans. Evol. Comput., vol. 26, no. 4, pp. 676–689, Aug. 2022. doi: 10.1109/TEVC.2021.3113923
|
| [140] |
H. Yuan, Q. Hu, J. Bi, J. Lü, J. Zhang, and M. Zhou, “Profit-optimized computation offloading with autoencoder-assisted evolution in large-scale mobile-edge computing,” IEEE Internet Things J., vol. 10, no. 13, pp. 11896–11909, Jul. 2023. doi: 10.1109/JIOT.2023.3244665
|
| [141] |
M. Cui, L. Li, M. C. Zhou, J. Li, A. Abusorrah, and K. Sedraoui, “A bi-population cooperative optimization algorithm assisted by an autoencoder for medium-scale expensive problems,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 11, pp. 1952–1966, Nov. 2022. doi: 10.1109/JAS.2022.105425
|
| [142] |
E. Karydi and K. Margaritis, “Parallel and distributed collaborative filtering: A survey,” ACM Comput. Surv., vol. 49, no. 2, p. 37, Jun. 2017.
|
| [143] |
Y. X. Wang and Y. J. Zhang, “Nonnegative matrix factorization: A comprehensive review,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 6, pp. 1336–1353, Jun. 2013. doi: 10.1109/TKDE.2012.51
|
| [144] |
Z. Liu, X. Luo, and Z. Wang, “Convergence analysis of single latent factor-dependent, nonnegative, and multiplicative update-based nonnegative latent factor models,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1737–1749, Apr. 2021. doi: 10.1109/TNNLS.2020.2990990
|
| [145] |
C. Peng, Y. Zhang, Y. Chen, Z. Kang, C. Chen, and Q. Cheng, “Log-based sparse nonnegative matrix factorization for data representation,” Knowl. Based Syst., vol. 251, p. 109127, Sep. 2022. doi: 10.1016/j.knosys.2022.109127
|
| [146] |
C. Peng, P. Zhang, Y. Chen, Z. Kang, C. Chen, and Q. Cheng, “Fine-grained bipartite concept factorization for clustering,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2024, pp. 26264-26274.
|
| [147] |
Z. Wang, R. Yuan, J. Fu, K.-C. Wong, and C. Peng, “Core–periphery detection based on masked Bayesian nonnegative matrix factorization,” IEEE Trans. Comput. Soc. Syst., vol. 11, no. 3, pp. 4102–4113, Jun. 2024. doi: 10.1109/TCSS.2023.3347406
|
| [148] |
J. Wang and X.-L. Zhang, “Deep NMF topic modeling,” Neurocomputing, vol. 515, pp. 157–173, Jan. 2023. doi: 10.1016/j.neucom.2022.10.002
|
| [149] |
R. Kannan, G. Ballard, and H. Park, “A high-performance parallel algorithm for nonnegative matrix factorization,” in Proc. 21st ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Barcelona, Spain, 2016, pp. 9.
|
| [150] |
G. E. Moon, J. A. Ellis, A. Sukumaran-Rajam, S. Parthasarathy, and P. Sadayappan, “ALO-NMF: Accelerated locality-optimized non-negative matrix factorization,” in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Virtual Event, USA, 2020, pp. 1758-1767.
|
| [151] |
E. Mejía-Roa, D. Tabas-Madrid, J. Setoain, C. García, F. Tirado, and A. Pascual-Montano, “NMF-mGPU: Non-negative matrix factorization on multi-GPU systems,” BMC Bioinform., vol. 16, no. 1, p. 43, Feb. 2015. doi: 10.1186/s12859-015-0485-4
|