CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition

A. A. M. Muzahid; Wanggen Wan; Ferdous Sohel; Lianyao Wu; Li Hou

doi:10.1109/JAS.2020.1003324

Volume 8 Issue 6

Jun. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(6): 1177-1187

A. A. M. Muzahid, W. G. Wan, F. Sohel, L. Y. Wu, and L. Hou, "CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition," IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177-1187, Jun. 2021. doi: 10.1109/JAS.2020.1003324

Citation:

A. A. M. Muzahid, W. G. Wan, F. Sohel, L. Y. Wu, and L. Hou, "CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition," IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177-1187, Jun. 2021. doi: 10.1109/JAS.2020.1003324

Citation:

PDF( 1466 KB)

CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition

doi: 10.1109/JAS.2020.1003324

1.
School of Communications and Information Engineering, Institute of Smart City, Shanghai University, Shanghai 200444, China
2.
Discipline of Information Technology, Murdoch University, Murdoch WA 6150, Australia
3.
School of Information Engineering, Huangshan University, Huangshan 245041, China

Funds: This paper was partially supported by a project of the Shanghai Science and Technology Committee (18510760300), Anhui Natural Science Foundation (1908085MF178), and Anhui Excellent Young Talents Support Program Project (gxyqZD2019069)

More Information

Author Bio:
A. A. M. Muzahid is a Ph.D. candidate in communication and information systems at Shanghai University. He received the M.E. degree in commu-nication and information engineering from Chongqing University of Posts and Telecom-munications in 2016, and the B.Sc. degree in electronics and telecommunications engineering from Daffodil International University, Bangladesh, in 2011. He is a Research Member at the Institute of Smart City, Shanghai University, and leads the “Computer Vision and 3D Virtual Reality” research team. His current research interests include 3D Shape analysis, computer vision, and 3D virtual reality

Wanggen Wan (M’00–SM’06) is a Professor of the School of Communication and Information Engi-neering, Shanghai University (SHU). He received the Ph.D. degree from Xidian University in 1992. He was Postdoctoral Research Fellow at the Information and Control Engineering Department of Xi’an Jiao-Tong University from 1993 to 1995. From 1995 to 1998, he worked at the Electronic and Information Engineering Department of SHU. He was a Visiting Scholar at the Electrical and Electronic Engineering Department of HKUST in 1998 and 1999. He was a Visiting Professor and Section Head at the Multimedia Innovation Center of HKPU from 2000 to 2004. He became an IET Fellow in 2006. His research interests include computer graphics, computer vision, signal processing, and data mining

Ferdous Sohel (M’08–SM’14) received the Ph.D. degree from Monash University, Australia. He is currently an Associate Professor in information technology at Murdoch University. He worked as a Research Assistant Professor at the School of Computer Science and Software Engineering, The University of Western Australia. His research interests include computer vision, machine learning, pattern recognition, and digital agriculture. He is a recipient of prestigious Discovery Early Career Research Award funded by the Australian Research Council. He is an Associate Editor of IEEE Transactions on Multimedia and IEEE Signal Processing Letters. He is a Member of the Australian Computer Society and a Senior Member of the IEEE

Lianyao Wu received the B.E. degree in communica-tion engineering from the School of Communication and Information Engineering, Shanghai University in 2017. He is now a master student at Shanghai University. He is a Research Member at the Institute of smart City, Shanghai University. His current research interests include augmented reality, virtual reality, and computer graphics

Li Hou received the B.S. degree in communication engineering and M.S. degree in power electronics from Liaoning University of Technology in 2003 and 2006, respectively. She joined the School of Information Engineering of Huangshan University in 2006. She received the Ph.D. degree in commu-nication and information systems from Shanghai University in 2017. Her current research interests include computer vision, machine learning, and video/image processing
Corresponding author: A. A. M. Muzahid, e-mail: muzahid@shu.edu.cn
¹https://github.com/MEPP-team/MEPP
²http://www.acfr.usyd.edu.au/papers/SydneyUrbanObjectsDataset.shtml.
Received Date: 2020-05-21
Accepted Date: 2020-06-24

Available Online: 2020-07-07

Abstract

Abstract

In computer vision fields, 3D object recognition is one of the most important tasks for many real-world applications. Three-dimensional convolutional neural networks (CNNs) have demonstrated their advantages in 3D object recognition. In this paper, we propose to use the principal curvature directions of 3D objects (using a CAD model) to represent the geometric features as inputs for the 3D CNN. Our framework, namely CurveNet, learns perceptually relevant salient features and predicts object class labels. Curvature directions incorporate complex surface information of a 3D object, which helps our framework to produce more precise and discriminative features for object recognition. Multitask learning is inspired by sharing features between two related tasks, where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification. Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification. We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs. A Cross-Stitch module was adopted to learn effective shared features across multiple representations. We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.
- 3D shape analysis,
- convolutional neural network,
- DNNs,
- object classification,
- volumetric CNN

FullText(HTML)

¹https://github.com/MEPP-team/MEPP
²http://www.acfr.usyd.edu.au/papers/SydneyUrbanObjectsDataset.shtml.

References(56)

References

[1]	J. Q. Gu, H. F. Hu, and H. X. Li, “Local robust sparse representation for face recognition with single sample per person,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 2, pp. 547–554, Mar. 2018. doi: 10.1109/JAS.2017.7510658
[2]	Y. Xing, C. Lv, L. Chen, H. J. Wang, H. Wang, D. P. Cao, E. Velenis, and F. Y. Wang, “Advances in vision-based lane detection: Algorithms, integration, assessment, and perspectives on ACP-based parallel vision,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 3, pp. 645–661, May 2018. doi: 10.1109/JAS.2018.7511063
[3]	A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep learning advances in computer vision with 3D data: A survey,” ACM Comput. Surv., vol. 50, no. 2, Article No. 20, Apr. 2017.
[4]	E. Ahmed, A. Saint, A. E. R. Shabayek, K. Cherenkova, R. Das, G. Gusev, D. Aouada, and B. Ottersten, “A survey on deep learning advances on different 3D data representations,” arXiv preprint arXiv: 1808.01462v2, 2019.
[5]	M. Rezaei, M. Rezaeian, V. Derhami, F. Sohel, and M. Bennamoun, “Deep learning-based 3D local feature descriptor from Mercator projections,” Comput. Aided Geom. Des., vol. 74, Article No. 101771, Oct. 2019. doi: 10.1016/j.cagd.2019.101771
[6]	Z. R. Wu, S. R. Song, A. Khosla, F. Yu, L. G. Zhang, X. O. Tang, and J. X. Xiao, “3D ShapeNets: A deep representation for volumetric shapes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 1912–1920.
[7]	D. Maturana and S. Scherer, “VoxNet: A 3D convolutional neural network for real-time object recognition,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hamburg, Germany, 2015, pp. 922–928.
[8]	C. Wang, M. Cheng, F. Sohel, M. Bennamoun, and J. Li, “NormalNet: A voxel-based CNN for 3D object classification and retrieval,” Neurocomputing, vol. 323, pp. 139–147, Jan. 2019. doi: 10.1016/j.neucom.2018.09.075
[9]	S. F. Zhi, Y. X. Liu, X. Li, and Y. L. Guo, “Toward real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning,” Comput. &Graph., vol. 71, pp. 199–207, Apr. 2018.
[10]	X. H. Liu, Z. Z. Han, Y. S. Liu, and M. Zwicker, “Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network,” in Proc. AAAI Conf. Artificial Intelligence, 2019, pp. 8778–8785.
[11]	C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Red Hook, USA, 2017.
[12]	R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 77–85.
[13]	H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3D shape recognition,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 945–953.
[14]	C. R. Qi, H. Su, M. Nießner, A. Dai, M. Y. Yan, and L. J. Guibas, “Volumetric and multi-view CNNs for object classification on 3D data,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 5648–5656.
[15]	A. A. Liu, N. Hu, D. Song, F. B. Guo, H. Y. Zhou, and T. Hao, “Multi-view hierarchical fusion network for 3D object retrieval and classification,” IEEE Access, vol. 7, pp. 153021–153030, Oct. 2019. doi: 10.1109/ACCESS.2019.2947245
[16]	X. Feng, W. G. Wan, R. Y. D. Xu, H. Y. Chen, P. F. Li, and J. A. Sánchez, “A perceptual quality metric for 3D triangle meshes based on spatial pooling,” Front. Comput. Sci., vol. 12, no. 4, pp. 798–812, Jun. 2018. doi: 10.1007/s11704-017-6328-x
[17]	F. Torkhani, K. Wang, and J. M. Chassery, “A curvature-tensor-based perceptual quality metric for 3D triangular meshes,” Mach. Graph. &Vis., vol. 23, no. 1–2, pp. 59–82, 2014.
[18]	S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv: 1706.05098, 2017.
[19]	Z. L. Cai and W. Zhu, “Feature selection for multi-label classification using neighborhood preservation,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 320–330, Jan. 2018. doi: 10.1109/JAS.2017.7510781
[20]	A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, USA, 2012, pp. 1097–1105.
[21]	C. C. Liu, X. F. Sun, C. Y. Chen, P. L. Rosin, Y. T. Yan, L. C. Jin, and X. Y. Peng, “Multi-scale residual hierarchical dense networks for single image super-resolution,” IEEE Access, vol. 7, pp. 60572–60583, May 2019. doi: 10.1109/ACCESS.2019.2915943
[22]	L. F. Bo, X. F. Ren, and D. Fox, “Unsupervised feature learning for RGB-D based object recognition,” in Experimental Robotics: The 13th Int. Symposium on Experimental Robotics, J. P. Desai, G. Dudek, O. Khatib, and V. Kumar, Eds. Heidelberg, Germany: Springer, 2013, pp. 387–402.
[23]	S. Gupta, P. Arbeláez, R. Girshick, and J. Malik, “Aligning 3D models to RGB-D images of cluttered scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, Jun. 2015, pp. 4731–4740.
[24]	A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. X. Huang, Z. M. Li, S. Savarese, M. Savva, S. R. Song, H. Su, J. X. Xiao, L. Yi, and F. Yu, “ShapeNet: An information-rich 3D model repository,” arXiv preprint arXiv: 1512.03012, 2015.
[25]	A. Kanezaki, Y. Matsushita, and Y. Nishida, “RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018.
[26]	C. Ma, Y. L. Guo, J. G. Yang, and W. An, “Learning multi-view representation with LSTM for 3-D shape recognition and retrieval,” IEEE Trans. Multimed., vol. 21, no. 5, pp. 1169–1182, May 2019. doi: 10.1109/TMM.2018.2875512
[27]	Y. Ben-Shabat, M. Lindenbaum, and A. Fischer, “3D point cloud classification and segmentation using 3D modified fisher vector representation for convolutional neural networks,” arXiv preprint arXiv: 1711.08241, 2017.
[28]	Y. C. Liu, B. Fan, S. M. Xiang, and C. H. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in Proc. CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, Jan. 2019.
[29]	C. Ma, W. An, Y. J. Lei, and Y. L. Guo, “BV-CNNs: Binary volumetric convolutional networks for 3D object recognition,” in Proc. British Machine Vision Conf., London, UK, 2017, pp. 148.
[30]	G. Riegler, A. O. Ulusoy, and A. Geiger, “OctNet: Learning deep 3D representations at high resolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017.
[31]	P. S. Wang, Y. Liu, Y. X. Guo, C. Y. Sun, and X. Tong, “O-CNN: Octree-based convolutional neural networks for 3D shape analysis,” ACM Trans. Graph., vol. 36, no. 4, Article No. 72, Jul. 2017.
[32]	A. A. M. Muzahid, W. G. Wan, F. Sohel, N. U. Khan, O. D. C. Villagómez, and H. Ullah, “3D object classification using a volumetric deep neural network: An efficient octree guided auxiliary learning approach,” IEEE Access, vol. 8, pp. 23802–23816, Jan. 2020. doi: 10.1109/ACCESS.2020.2968506
[33]	N. Sedaghat, M. Zolfaghari, E. Amiri, and T. Brox, “Orientation-boosted voxel nets for 3D object recognition,” arXiv preprint arXiv: 1604.03351, 2017.
[34]	V. Hegde and R. Zadeh, “FusionNet: 3D object classification using multiple data representations,” arXiv preprint arXiv: 1607.05695, 2016.
[35]	A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Generative and discriminative voxel modeling with convolutional neural networks,” arXiv preprint arXiv: 1608.04236, 2016.
[36]	S. Ghadai, X. Lee, A. Balu, S. Sarkar, and A. Krishnamurthy, “Multi-level 3D CNN for learning multi-scale spatial features,” arXiv preprint arXiv: 1805.12254v2, 2019.
[37]	L. Luciano and A. B. Hamza, “Deep similarity network fusion for 3D shape classification,” Vis. Comput., vol. 35, no. 6–8, pp. 1171–1180, Jun. 2019. doi: 10.1007/s00371-019-01668-9
[38]	Y. Hechtlinger, P. Chakravarti, and J. N. Qin, “A generalization of convolutional neural networks to graph-structured data,” arXiv preprint arXiv: 1704.08165, 2017.
[39]	K. G. Zhang, M. Hao, J. Wang, C. W. De Silva, and C. L. Fu, “Linked dynamic graph CNN: Learning on point cloud via linking hierarchical features,” arXiv preprint arXiv: 1904.10014, 2019.
[40]	Y. F. Feng, H. X. You, Z. Z. Zhang, R. R. Ji, and Y. Gao, “Hypergraph neural networks,” arXiv preprint arXiv: 1809.09401, 2019.
[41]	Z. Z. Zhang, H. J. Lin, X. B. Zhao, R. R. Ji, and Y. Gao, “Inductive multi-hypergraph learning and its application on view-based 3D object classification,” IEEE Trans. Image Process., vol. 27, no. 12, pp. 5957–5968, Dec. 2018. doi: 10.1109/TIP.2018.2862625
[42]	S. H. Khan, Y. L. Guo, M. Hayat, and N. Barnes, “Unsupervised primitive discovery for improved 3D generative modeling,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, Jun. 2019, pp. 9731–9740.
[43]	Y. Q. Yang, C. Feng, Y. R. Shen, and D. Tian, “FoldingNet: Point cloud auto-encoder via deep grid deformation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 206–215.
[44]	J. X. Li, B. M. Chen, and G. H. Lee, “SO-Net: Self-organizing network for point cloud analysis,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 9397–9406.
[45]	L. Minto, P. Zanuttigh, and G. Pagnutti, “Deep learning for 3D shape classification based on volumetric density and surface approximation clues,” in Proc. 13th Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, Madeira, Portugal, 2018, pp. 317–324.
[46]	D. Cohen-Steiner and J. M. Morvan, “Restricted Delaunay triangulations and normal cycle,” in Proc. 9th Conf. Computational Geometry, San Diego, USA, 2003, pp. 312–321.
[47]	P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia, “MeshLab: An open-source mesh processing tool,” Eurographics Italian Chapter Conf., V. Scarano, R. De Chiara, and U. Erra, Eds. The Eurographics Association, 2008.
[48]	L. Y. Deng, “The cross-entropy method: A unified approach to combinatorial optimization, monte-carlo simulation, and machine learning,” Technometrics, vol. 48, no. 1, pp. 147–148, Feb. 2006. doi: 10.1198/tech.2006.s353
[49]	I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 3994–4003.
[50]	I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. 30th Int. Conf. Machine Learning, Atlanta, USA, 2013, pp. 11139–47.
[51]	R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York, USA: Springer, 2004.
[52]	N. Sedaghat and T. Brox, “Unsupervised generation of a view point annotated car dataset from videos,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1314–1322.
[53]	A. Cheraghian and L. Petersson, “3DCapsule: Extending the capsule architecture to classify 3D point clouds,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1194–1202.
[54]	J. J. Yin, N. N. Huang, J. Tang, and M. E. Fang, “Recognition of 3D shapes based on 3V-DepthPano CNN,” Math. Probl. Eng., vol. 2020, Article No. 7584576, Jan. 2020.
[55]	I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 3994–4003.
[56]	M. De Deuge, A. Quadros, C. Hung, and B. Douillard, “Unsupervised feature learning for classification of outdoor 3D scans,” in Proc. Australasian Conf. Robotics and Automation, Australia, 2013.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (2441) PDF downloads(118)

Highlights

CurveNet is a novel volumetric CNN 3D for object classification
It takes curvature points features as input
It applies network parameter sharing and shows soft sharing achives the best results
It achieves state-of-the-art voxels representation results.

CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition

doi: 10.1109/JAS.2020.1003324

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content