Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation

Yu Liu; Bin Jiang; Jiaming Xu

doi:10.1109/JAS.2022.105863

Volume 10 Issue 3

Mar. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(3): 711-721

Y. Liu, B. Jiang, and J. M. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, Mar. 2023. doi: 10.1109/JAS.2022.105863

Citation:

Y. Liu, B. Jiang, and J. M. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, Mar. 2023. doi: 10.1109/JAS.2022.105863

Citation:

PDF( 2218 KB)

Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation

doi: 10.1109/JAS.2022.105863

Yu Liu^{1
,
,},
Bin Jiang^2
,,
Jiaming Xu^2
,

1.
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, and with the R&D Center of Precision Electronic Manufacturing Technology, Guangzhou Institute of Modern Industrial Technology, and also with the High Perfor-mance Motor and Intelligent Control Engineering Technology Research Center of Guangdong Province, Guangzhou 511458, China
2.
School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China

Funds: This work was supported in part by the Key Research and Development Program of Guangdong Province (2021B0101200001) and the Guangdong Basic and Applied Basic Research Foundation (2020B1515120071)

More Information

Author Bio:
Yu Liu (Senior Member, IEEE) received the Ph.D. degree in automatic control from South China University of Technology, in 2009.He was a visiting student with the Department of Mechanical Engineering, Concordia University, Canada, from 2008 to 2009, and a Visiting Scholar with the Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, USA, from 2017 to 2018. He is a Professor with the School of Automation Science and Engineering, South China University of Technology, also with the R&D Center of Precision Electronic Manufacturing Technology, Guangzhou Institute of Modern Industrial Technology, and also with the High Performance Motor and Intelligent Control Engineering Technology Research Center of Guangdong Province. He is currently an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Fuzzy Systems and IEEE Transactions on Computational Social Systems. His research interests include distributed parameter systems, intelligent control and robot control, learning control, intelligent perception and decision making, and machine vision

Bin Jiang received the B.E. degree in automation from Guangdong University of Technology, in 2018. He is currently pursuing the M.E. degree in electronic information with the School of Automation Science and Engineering, South China University of Technology. His research interests include pattern recognition, deep learning, and computer vision

Jiaming Xu (Member, IEEE) received the M.E. degree in mechanical and electronic engineering from Guangzhou University, in 2018. He received the Ph.D. degree in control science and engineering at the School of Automation Science and Engineering, South China University of Technology. His research interests include pattern recognition, computer vision, and intelligent control
Corresponding author: Yu Liu, e-mail: auylau@scut.edu.cn
Received Date: 2022-06-14
Accepted Date: 2022-07-22

Available Online: 2022-08-08

Abstract

Abstract

Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars. It remains a challenge because of large intra-class variations between the support and query images. Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images. However, they still suffer from heavy computation, sparse correspondence, and large memory. We propose axial assembled correspondence network (AACNet) to alleviate these issues. The key point of AACNet is the proposed axial assembled 4D kernel, which constructs the basic block for semantic correspondence encoder (SCE). Furthermore, we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner. Experiments on PASCAL-5ⁱ reveal that our AACNet achieves a mean intersection-over-union score of 65.9 % for 1-shot segmentation and 70.6 % for 5-shot segmentation, surpassing the state-of-the-art method by 5.8 % and 5.0 % respectively.
- Artificial intelligence,
- computer vision,
- deep convolutional neural network,
- few-shot semantic segmentation

FullText(HTML)

References(51)

References

[1]	J. M. Xu and Y. Liu, “Multiple guidance network for industrial product surface inspection with one labeled target sample,” IEEE Trans. Neural Netw. Learn. Syst., 2022, DOI: 10.1109/TNNLS.2022.3165575.
[2]	I. Ahmed, S. Din, G. Jeon, F. Piccialli, and G. Fortino, “Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253–1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453
[3]	Y. Liu, J. M. Xu, and Y. L. Wu, “A CISG method for internal defect detection of solar cells in different production processes,” IEEE Trans. Ind. Electron., vol. 69, no. 8, pp. 8452–8462, Aug. 2022. doi: 10.1109/TIE.2021.3104584
[4]	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018. doi: 10.1109/TPAMI.2017.2699184
[5]	E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, Apr. 2017. doi: 10.1109/TPAMI.2016.2572683
[6]	J. H. Dong, Y. Cong, G. Sun, B. N. Zhong, and X. W. Xu, “What can be transferred: Unsupervised domain adaptation for endoscopic lesions segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 4022–4031.
[7]	K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
[8]	G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2261–2269.
[9]	J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
[10]	S. D. Li, K. Han, T. W. Costain, H. Howard-Jenkins, and V. Prisacariu, “Correspondence networks with adaptive neighbourhood consensus,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 10193–10202.
[11]	C. B. Choy, J. Y. Gwak, S. Savarese, and M. Chandraker, “Universal correspondence network,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2414–2422.
[12]	K. Han, R. S. Rezende, B. Ham, K. Y. K. Wong, M. Cho, C. Schmid, and J. Ponce, “SCNet: Learning semantic correspondence”, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1849–1858.
[13]	I. Rocco, M. Cimpoi, R. Arandjelovicć, A. Torii, T. Pajdla, and J. Sivic, “Neighbourhood consensus networks,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 1658–1669.
[14]	J. Long, N. Zhang, and T. Darrell, “Do convnets learn correspondence?” in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1601–1609.
[15]	Y. B. Liu, L. C. Zhu, M. Yamada, and Y. Yang, “Semantic correspondence as an optimal transport problem,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 4462–4471.
[16]	G. S. Yang and D. Ramanan, “Volumetric correspondence networks for optical flow,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 794–805.
[17]	J. Min, D. Kang, and M. Cho, “Hypercorrelation squeeze for few-shot segmenation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Montréal, Canada, 2021, pp. 6921–6932.
[18]	A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in Proc. British Machine Vision Conf., London, UK, 2017, pp. 1–9.
[19]	K. Nguyen and S. Todorovic, “Feature weighting and boosting for few-shot segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 622–631.
[20]	H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, and J. Y. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 6230–6239.
[21]	O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234–241.
[22]	Y. F. Xia, H. Yu, and F.-Y. Wang, “Accurate and robust eye center localization via fully convolutional networks,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1127–1138, Sept. 2019. doi: 10.1109/JAS.2019.1911684
[23]	K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y. Wang, “FISS GAN: A generative adversarial network for foggy image semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1428–1439, Aug. 2021. doi: 10.1109/JAS.2021.1004057
[24]	O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3637–3645.
[25]	J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 4080–4090.
[26]	C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.
[27]	Q. Cai, Y. W. Pan, T. Yao, C. G. Yan, and T. Mei, “Memory matching networks for one-shot image recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 4080–4088.
[28]	X. L. Zhang, Y. C. Wei, Y. Yang, and T. S. Huang, “SG-One: Similarity guidance network for one-shot semantic segmentation,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3855–3865, Sept. 2020. doi: 10.1109/TCYB.2020.2992433
[29]	K. X. Wang, J. H. Liew, Y. T. Zou, D. Q. Zhou, and J. S. Feng, “PANet: Few-shot image semantic segmentation with prototype alignment,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 9196–9205.
[30]	C. Zhang, G. S. Lin, F. Y. Liu, R. Yao, and C. H. Shen, “CANet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5212–5221.
[31]	Z. T. Tian, H. S. Zhao, M. Shu, Z. C. Yang, R. Y. Li, and J. Y. Jia, “Prior guided feature enrichment network for few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 1050–1065, Feb. 2022. doi: 10.1109/TPAMI.2020.3013717
[32]	W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking wider to see better,” arXiv preprint arXiv: 1506.04579, 2015.
[33]	H. C. Wang, X. D. Zhang, Y. T. Hu, Y. D. Yang, X. B. Cao, and X. T. Zhen, “Few-shot semantic segmentation with democratic attention networks,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 730–746.
[34]	C. Zhang, G. S. Lin, F. Y. Liu, J. S. Guo, Q. Y. Wu, and R. Yao, “Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 9586–9594.
[35]	D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. doi: 10.1023/B:VISI.0000029664.99615.94
[36]	N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 886–893.
[37]	S. Kim, D. Min, B. Ham, S. Lin, and K. Sohn, “FCSS: Fully convolutional self-similarity for dense semantic correspondence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 581–595, Mar. 2019. doi: 10.1109/TPAMI.2018.2803169
[38]	Y. X. Wu and K. M. He, “Group normalization,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 742–755, Mar. 2020. doi: 10.1007/s11263-019-01198-w
[39]	M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015. doi: 10.1007/s11263-014-0733-5
[40]	B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 297–312.
[41]	X. Li, T. H. Wei, Y. P. Chen, Y. W. Tai, and C. K. Tang, “FSS-1000: A 1000-class dataset for few-shot segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 2866–2875.
[42]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2015.
[43]	K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Conditional networks for few-shot semantic segmentation,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
[44]	G. Li, V. Jampani, L. Sevilla-Lara, D. Q. Sun, J. Kim, and J. Kim, “Adaptive prototype learning and allocation for few-shot segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 8330–8339.
[45]	M. Boudiaf, H. Kervadec, Z. I. Masud, P. Piantanida, I. B. Ayed, and J. Dolz, “Few-shot segmentation without meta-learning: A good transductive inference is all you need?” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 13974–13983.
[46]	T. Hu, P. W. Yang, C. L. Zhang, G. Yu, Y. D. Mu, and C. G. M. Snoek, “Attention-based multi-context guiding for few-shot semantic segmentation,” in Proc. Proc. 33rd AAAI Conf. Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conf. and 9th AAAI Symp. Educational Advances in Artificial Intelligence, Honolulu, USA, 2019, pp. 8441–8448.
[47]	Y. F. Liu, X. Y. Zhang, S. Y. Zhang, and X. M. He, “Part-aware prototype network for few-shot semantic segmentation,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 142–158.
[48]	B. Y. Yang, C. Liu, B. H. Li, J. B. Jiao, and Q. X. Ye, “Prototype mixture models for few-shot semantic segmentation,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 763–778.
[49]	T. Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[50]	K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Few-shot segmentation propagation with guided networks,” arXiv preprint arXiv: 1806.07373, 2018.
[51]	R. Azad, A. R. Fayjie, C. Kauffmann, I. B. Ayed, M. Pedersoli, and J. Dolz, “On the texture bias for few-shot CNN segmentation,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2021, pp. 2673–2682.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (804) PDF downloads(90)

Highlights

We develop a novel 4d kernel (AA-Conv4d). It conducts an appropriate weight-sparsification while keeping sufficient communications between the support and the query subspaces
We propose a simple but effective preprocessing module to modify the statistical distribution of the semantic correspondences, which can effectively improve segmentation performance
By mixing pyramid correspondences with a learnable concatenation operation, our FM helps adaptively refine the squeezed correspondences for query segmentation
This work achieves a mean Intersection-over-Union score of $65.9\%$ and $70.6\%$ on PASCAL-5$^i$ for 1-shot and 5-shot settings respectively, outperforming state-of-the-art results by $5.8\%$ and $5.0\%$ respectively

Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation

doi: 10.1109/JAS.2022.105863

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content