Point-MASNet: Masked Autoencoder-Based Sampling Network for 3D Point Cloud

Xu Wang; Yi Jin; Hui Yu; Yigang Cen; Tao Wang; Yidong Li

doi:10.1109/JAS.2024.125088

Volume 12 Issue 11

Nov. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(11): 2300-2313

X. Wang, Y. Jin, H. Yu, Y. Cen, T. Wang, and Y. Li, “Point-MASNet: Masked autoencoder-based sampling network for 3D point cloud,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 11, pp. 2300–2313, Nov. 2025. doi: 10.1109/JAS.2024.125088

Citation:

X. Wang, Y. Jin, H. Yu, Y. Cen, T. Wang, and Y. Li, “Point-MASNet: Masked autoencoder-based sampling network for 3D point cloud,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 11, pp. 2300–2313, Nov. 2025. doi: 10.1109/JAS.2024.125088

Citation:

PDF( 5092 KB)

Point-MASNet: Masked Autoencoder-Based Sampling Network for 3D Point Cloud

doi: 10.1109/JAS.2024.125088

Xu Wang^,,
Yi Jin^{,
,},
Hui Yu^{,
,},
Yigang Cen^,,
Tao Wang^,,
Yidong Li^,

Funds: This work was supported by the National Key Research and Development Program of China (2022YFB3103500), the National Natural Science Foundation of China (62473033, 62571027), in part by the Beijing Natural Science Foundation (L231012), and the State Scholarship Fund from the China Scholarship Council

More Information

Author Bio:
Xu Wang received the Ph.D. degree from the School of Computer Science and Technology, Beijing Jiaotong University in 2025. He is currently a Postdoctoral Researcher with the School of Computer Science and Technology, Beijing Jiaotong University. His research interests include machine learning and computer vision. From 2024 to 2025, he is a Visiting Scholar with the cSCAN Centre, University of Glasgow, Glasgow, UK

Yi Jin (Senior Member, IEEE) is a Professor with the School of Computer Science and Technology, Beijing Jiaotong University. Her research interests include image processing and signal processing. She received the Ph.D. degree in signal and information processing from the Institute of Information Science, Beijing Jiaotong University in 2010. She was a Visiting Scholar with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, from 2013 to 2014

Hui Yu (Senior Member, IEEE) is a Professor of Visual and Cognitive Computing at the University of Glasgow, UK and leads the Visual Computing and Social Robotics Group. His research interests include visual and cognitive computing as well as machine learning, particularly social signal analysis, 3D/4D facial expression processing, image/video analysis and clustering for social robot and social interaction as well as intelligent vehicles. He has been awarded the Industrial Fellowship by the Royal Academy of Engineering. He serves as an Associate Editor for IEEE Transactions on Human-Machine Systems, IEEE Transactions on Intelligent Vehicles, and IEEE Transactions on Computational Social Systems journal. He is also the Deputy Editor-in-Chief of the International Journal of Network Dynamics and Intelligence journal

Yigang Cen is a Professor at the Institute of Information Science, Beijing Jiaotong University, Beijing Key Laboratory of Advanced Information Science and Technology. His research interests include computer vision, image understanding/processing, IOT. He received the Ph.D. degree in control theory and control engineering from Huazhong University of Science and Technology in 2006. In 2006, he joined the Signal Processing Centre, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, as a Research Fellow. From 2014 to 2015, he was a Visiting Scholar with the Department of Computer Science, University of Missouri, Columbia, USA

Tao Wang (Member, IEEE) received the Ph.D. degree from the School of Computer Science and Technology, Beijing Jiaotong University in 2013. He was a Visiting Scholar with the Department of Computer and Information Sciences, Temple University, USA, from 2014 to 2015. He is currently a Professor with the School of Computer and Information Technology, Beijing Jiaotong University. His research interests include computer vision and machine learning

Yidong Li (Senior Member, IEEE) is the Dean and a Professor at the School of Computer Science and Technology, Beijing Jiaotong University. Dr. Li received the B.Eng. degree in electrical and electronic engineering from Beijing Jiaotong University in 2003, and M.Sci. and Ph.D. degrees in computer science from the University of Adelaide, Australia, in 2006 and 2010, respectively. Dr. Li’s research interests include big data analysis, data privacy and security, advanced computing and intelligent transportation. Dr. Li has published more than 100 research papers in various journals (such as IEEE Transactions on Information Forensics and Security, IEEE Transactions on Intelligent Transportation Systems, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Cybernetics), and refereed conferences (such as SIGKDD (ACM SIGKDD Conference on Knowledge Discovery and Data Mining), CVPR (IEEE/CVF Computer Vision and Pattern Recognition Conference), AAAI (AAAI Conference on Artificial Intelligence)). He has also co-authored/co-edited 5 books (including proceedings) and contributed several book chapters. He has organized several international conferences and workshops and has also served as a program committee member for several major international conferences such as ICML/PKDD (International Conference on Machine Learning/Principles and Practice of Knowledge Discovery in Databases), PAKDD (Pacific-Asia Conference on Knowledge Discovery and Data Mining), NFOSCALE, WAC, SAC (Selected Areas in Cryptography), PDCAT (International Conference on Parallel and Distributed Computing, Applications and Technologies), DANTH, and PAAP (Parallel Architectures, Algorithms and Programming)
Corresponding author: Yi Jin, e-mail: yjin@bjtu.edu.cn; Hui Yu, e-mail: Hui.Yu@glasgow.ac.uk
Received Date: 2024-09-14
Revised Date: 2024-11-26
Accepted Date: 2024-12-02

Available Online: 2025-11-21

Abstract

Abstract

Task-oriented point cloud sampling aims to select a representative subset from the input, tailored to specific application scenarios and task requirements. However, existing approaches rarely tackle the problem of redundancy caused by local structural similarities in 3D objects, which limits the performance of sampling. To address this issue, this paper introduces a novel task-oriented point cloud masked autoencoder-based sampling network (Point-MASNet), inspired by the masked autoencoder mechanism. Point-MASNet employs a voxel-based random non-overlapping masking strategy, which allows the model to selectively learn and capture distinctive local structural features from the input data. This approach effectively mitigates redundancy and enhances the representativeness of the sampled subset. In addition, we propose a lightweight, symmetrically structured keypoint reconstruction network, designed as an autoencoder. This network is optimized to efficiently extract latent features while enabling refined reconstructions. Extensive experiments demonstrate that Point-MASNet achieves competitive sampling performance across classification, registration, and reconstruction tasks.
- Autoencoder,
- deep learning,
- efficiency-enhanced,
- point cloud,
- task-oriented sampling

FullText(HTML)

References(51)

References

[1]	L. Chen, X. Hu, W. Tian, H. Wang, D. Cao, and F. Wang, “Parallel planning: A new motion planning framework for autonomous driving,” IEEE CAA J. Autom. Sinica, vol. 6, no. 1, pp. 236–246, 2019. doi: 10.1109/JAS.2018.7511186
[2]	H. Liu, C. Lin, B. Gong, and D. Wu, “Automatic lane-level intersection map generation using low-channel roadside LiDAR,” IEEE CAA J. Autom. Sinica, vol. 10, no. 5, pp. 1209–1222, 2023. doi: 10.1109/JAS.2023.123183
[3]	Y. Liu, B. Tian, Y. Lv, L. Li, and F. Wang, “Point cloud classification using content-based transformer via clustering in feature space,” IEEE CAA J. Autom. Sinica, vol. 11, no. 1, pp. 231–239, 2024. doi: 10.1109/JAS.2023.123432
[4]	W. Ren, Y. Tang, Q. Sun, C. Zhao, and Q. Han, “Visual semantic segmentation based on few/zero-shot learning: An overview,” IEEE CAA J. Autom. Sinica, vol. 11, no. 5, pp. 1106–1126, 2024. doi: 10.1109/JAS.2023.123207
[5]	X. Wang, Y. Jin, Y. Cen, T. Wang, and Y. Li, “Attention models for point clouds in deep learning: A survey,” arXiv preprint arXiv: 2102.10788, 2021.
[6]	F. Yin, Z. Huang, T. Chen, G. Luo, G. Yu, and B. Fu, “DCNet: Largescale point cloud semantic segmentation with discriminative and efficient feature aggregation,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 8, pp. 4083–4095, 2023. doi: 10.1109/TCSVT.2023.3239541
[7]	Q. Zhang, L. Wang, H. Meng, W. Zhang, and G. Huang, “A LiDAR point clouds dataset of ships in a maritime environment,” IEEE CAA J. Autom. Sinica, vol. 11, no. 7, pp. 1681–1694, 2024. doi: 10.1109/JAS.2024.124275
[8]	I. Lang, A. Manor, and S. Avidan, “SampleNet: Differentiable point cloud sampling,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition. Seattle, WA, USA, 2020, pp. 7575–7585.
[9]	X. Wang, Y. Jin, Y. Cen, T. Wang, B. Tang, and Y. Li, “LighTN: Lightweight transformer network for performance-overhead tradeoff in point cloud downsampling,” IEEE Trans. Multim., pp. 1–16, 2023.
[10]	C. Wu, J. Zheng, J. Pfrommer, and J. Beyerer, “Attention-based point cloud edge sampling,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 5333–5343.
[11]	C. Wen, B. Yu, and D. Tao, “Learnable skeleton-aware 3d point cloud sampling,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 17671–17681.
[12]	K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked x autoencoders are scalable vision learners,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 15979–15988.
[13]	A. Xiao, J. Huang, D. Guan, X. Zhang, S. Lu, and L. Shao, “Unsupervised point cloud representation learning with deep neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 11321–11339, 2023. doi: 10.1109/TPAMI.2023.3262786
[14]	J. Li, S. Guo, L. Wang, and S. Han, “HQ-Net: A heatmap-based query backbone for point cloud understanding,” Neurocomputing, vol. 606, p. 128413, 2024. doi: 10.1016/j.neucom.2024.128413
[15]	Z. Zhang, H. Jiang, D. Shen, and S. S. Saab, “Data-driven learning control algorithms for unachievable tracking problems,” IEEE CAA J. Autom. Sinica, vol. 11, no. 1, pp. 205–218, 2024. doi: 10.1109/JAS.2023.123756
[16]	R. Hassan, M. M. Fraz, A. Rajput, and M. Shahzad, “Residual learning with annularly convolutional neural networks for classification and segmentation of 3D point clouds,” Neurocomputing, vol. 526, pp. 96–108, 2023. doi: 10.1016/j.neucom.2023.01.026
[17]	S. Ye, D. Chen, S. Han, Z. Wan, and J. Liao, “Meta-PU: An arbitraryscale upsampling network for point cloud,” IEEE Trans. Vis. Comput. Graph., vol. 28, no. 9, pp. 3206–3218, 2022. doi: 10.1109/TVCG.2021.3058311
[18]	J. Liu, J. Guo, and D. Xu, “APSNet: Toward adaptive point sampling for efficient 3D action recognition,” IEEE Trans. Image Process., vol. 31, pp. 5287–5302, 2022. doi: 10.1109/TIP.2022.3193290
[19]	Y. Wu, X. Hu, Y. Zhang, M. Gong, W. Ma, and Q. Miao, “SACF-Net: Skip-attention based correspondence filtering network for point cloud registration,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 8, pp. 3585–3595, 2023. doi: 10.1109/TCSVT.2023.3237328
[20]	Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Learning semantic segmentation of large-scale point clouds with random sampling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8338–8354, 2022.
[21]	Z. Li, Z. Chen, A. Li, L. Fang, Q. Jiang, X. Liu, J. Jiang, B. Zhou, and H. Zhao, “SimIPU: Simple 2D image and 3D point cloud unsupervised pre-training for spatial-aware visual representations,” in Proc. 36th AAAI Conf. Artificial Intelligence, 2022, pp. 1500–1508.
[22]	Y. Tian, L. Huang, H. Yu, X. Wu, X. Li, K. Wang, Z. Wang, and F. Wang, “Context-aware dynamic feature extraction for 3D object detection in point clouds,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 10773–10785, 2022. doi: 10.1109/TITS.2021.3095719
[23]	O. Dovrat, I. Lang, and S. Avidan, “Learning to sample,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 2760–2769.
[24]	X. Wang, Y. Jin, Y. Cen, C. Lang, and Y. Li, “Pst-net: Point cloud sampling via point-based transformer,” in Proc. Image and Graphics: 11th Int. Conf., Haikou, China, 2021, pp. 57–69.
[25]	H. Lee, J. Jeon, S. Hong, J. Kim, and J. Yoo, “TransNet: Transformerbased point cloud sampling network,” Sensors, vol. 23, p. 10, 2023.
[26]	Y. Qian, J. Hou, Q. Zhang, Y. Zeng, S. Kwong, and Y. He, “Task-oriented compact representation of 3D point clouds via a matrix optimizationdriven network,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 11, pp. 6981–6995, 2023. doi: 10.1109/TCSVT.2023.3270315
[27]	Y. Lin, T. Liu, Y. Zhang, S. Liu, L. Ye, and H. Min, “LA-Net: LSTM and attention based point cloud down-sampling and its application,” Meas. Control., vol. 56, pp. 7–8, 1261.
[28]	Y. Pang, W. Wang, F. E. H. Tay, W. Liu, Y. Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” in Proc. European Conference on Computer Vision, Tel Aviv, Israel, vol. 13662, 2022, pp. 604–621.
[29]	R. Zhang, Z. Guo, P. Gao, R. Fang, B. Zhao, D. Wang, Y. Qiao, and H. Li, “Point-M2AE: Multi-scale masked autoencoders for hierarchical point cloud pre-training,” in Proc. Advances in Neural Information Processing Systems, New Orleans, LA, USA, vol. 35, 2022, pp. 27061–27074.
[30]	J. Jiang, X. Lu, L. Zhao, R. Dazaley, and M. Wang, “Masked autoencoders in 3D point cloud representation learning,” IEEE Trans. Multim., pp. 1–12, 2023.
[31]	C. Min, L. Xiao, D. Zhao, Y. Nie, and B. Dai, “Occupancy-MAE: Self-supervised pre-training large-scale LiDAR point clouds with masked occupancy autoencoders,” IEEE Trans. Intell. Veh., vol. 9, no. 7, pp. 5150–5162, 2024. doi: 10.1109/TIV.2023.3322409
[32]	G. Hess, J. Jaxing, E. Svensson, D. Hagerman, C. Petersson, and L. Svensson, “Masked autoencoder for self-supervised pre-training on LiDAR point clouds,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision Workshops, Waikoloa, HI, USA, 2023, pp. 350–359.
[33]	G. Krispel, D. Schinagl, C. Fruhwirth-Reisinger, H. Possegger, and H. Bischof, “MAELi: Masked autoencoder for large-scale LiDAR point clouds,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2024, pp. 3371–3380.
[34]	H. Li, Z. Wang, C. Lan, P. Wu, and N. Zeng, “A novel dynamic multiobjective optimization algorithm with hierarchical response system,” IEEE Trans. Comput. Soc. Syst., vol. 11, no. 2, pp. 2494–2512, 2024. doi: 10.1109/TCSS.2023.3293331
[35]	W. Tan, H. Zhang, Z. Wang, H. Li, X. Gao, and N. Zeng, “S3TNet: A novel electroencephalogram signals-oriented emotion recognition model,” Comput. Biol. Medicine, vol. 179, p. 108808, 2024. doi: 10.1016/j.compbiomed.2024.108808
[36]	L. Hu, Z. Wang, H. Li, P. Wu, J. Mao, and N. Zeng, “ $ {\mathcal{L}}$-DARTS: Lightweight differentiable architecture search with robustness enhancement strategy,” Knowl. Based Syst., vol. 288, p. 111466, 2024. doi: 10.1016/j.knosys.2024.111466
[37]	M. Guo, J. Cai, Z. Liu, T. Mu, R. R. Martin, and S. Hu, “PCT: Point cloud transformer,” Comput. Vis. Media, vol. 7, no. 2, pp. 187–199, 2021. doi: 10.1007/s41095-021-0229-5
[38]	Y. Ye, X. Yang, and S. Ji, “APSNet: Attention based point cloud sampling,” in Proc. 33rd British Machine Vision Conf., London, UK, 2022, p. 298.
[39]	Y. Lin, K. Chen, S. Zhou, Y. Huang, and Y. Lei, “CO-NET: Classification-oriented point cloud sampling via informative feature learning and non-overlapped local adjustment,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 2023, pp. 1–5.
[40]	J. Xiong, T. Dai, Y. Zha, X. Wang, and S. Xia, “Semantic preserving learning for Task-oriented point cloud downsampling,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 2023, pp. 1–5.
[41]	C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 77–85.
[42]	Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A deep representation for volumetric shapes,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1912–1920.
[43]	V. Sarode, X. Li, H. Goforth, Y. Aoki, R. A. Srivatsan, S. Lucey, and H. Choset, “PCRNet: Point cloud registration network using pointnet encoding,” arXiv preprint arXiv: 1908. 07906, 2019.
[44]	W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “PCN: Point completion network,” in Proc. Int. Conf. 3D Vision, Verona, Italy, 2018, pp. 728–737.
[45]	A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An information-rich 3D model repository,” arXiv preprint arXiv: 1512.03012, 2015.
[46]	M. A. Uy, Q. Pham, B. Hua, D. T. Nguyen, and S. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 1588–1597.
[47]	Y. Dai, C. Wen, H. Wu, Y. Guo, L. Chen, and C. Wang, “Indoor 3D human trajectory reconstruction using surveillance camera videos and point clouds,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2482–2495, 2022. doi: 10.1109/TCSVT.2021.3081591
[48]	Q. Han, “Editorial: Driving into future with reliable, secure, efficient and intelligent metavehicles,” IEEE CAA J. Autom. Sinica, vol. 10, no. 6, pp. 1355–1356, 2023. doi: 10.1109/JAS.2023.123621
[49]	H. Li, Y. Guo, Z. Ren, F. R. Yu, J. You, and X. You, “Explicit local coupling global structure clustering,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 11, pp. 6649–6660, 2023. doi: 10.1109/TCSVT.2023.3266283
[50]	S. Zang, H. Jin, Q. Yu, S. Zhang, and H. Yu, “Video summarization using u-shaped non-local network,” Int. J. Network Dynamics and Intelligence, pp. 100013–100013, 2024.
[51]	X. Wang, Y. Jin, H. Yu, Y. Cen, and Y. Li, "Visual perception-inspired 3D point cloud sampling," Pattern Recognition, 169, p.111883, 2026.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(11) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (8) PDF downloads(1)

Point-MASNet: Masked Autoencoder-Based Sampling Network for 3D Point Cloud

doi: 10.1109/JAS.2024.125088

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content