A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 12
Dec.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
L. F. Tang, Y. X. Deng, Y. Ma, J. Huang, and J. Y. Ma, "SuperFusion: A versatile image registration and fusion network with semantic awareness, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp.2021-2137, Dec. 2022. doi: 10.1109/JAS.2022.106082
Citation: L. F. Tang, Y. X. Deng, Y. Ma, J. Huang, and J. Y. Ma, "SuperFusion: A versatile image registration and fusion network with semantic awareness, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp.2021-2137, Dec. 2022. doi: 10.1109/JAS.2022.106082

SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness

doi: 10.1109/JAS.2022.106082
Funds:

the National Natural Science Foundation of China 62276192

the National Natural Science Foundation of China 62075169

the National Natural Science Foundation of China 62061160370

the Key Research and Development Program of Hubei Province 2020BAB113

More Information
  • Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition, the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks. This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named SuperFusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion, and semantic segmentation tasks demonstrate the superiority of our SuperFusion compared to the state-of-the-art alternatives. The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/SuperFusion.

     

  • loading
  • Recommended by Associate Editor Qing-Long Han. (Linfeng Tang and Yuxin Deng contributed equally to this work.)
  • [1]
    X. Zhang, "Deep learning-based multi-focus image fusion: A survey and a comparative study, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 44, no. 9, pp. 4819–4838, 2022.
    [2]
    J. Ma, Y. Ma, and C. Li, "Infrared and visible image fusion methods and applications: A survey, " Inf. Fusion, vol. 45, pp. 153–178, 2019. doi: 10.1016/j.inffus.2018.02.004
    [3]
    X. Zhang, P. Ye, and G. Xiao, "Vifb: A visible and infrared image fusion benchmark, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 104–105.
    [4]
    H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective, " Inf. Fusion, vol. 76, pp. 323–336, 2021. doi: 10.1016/j.inffus.2021.06.008
    [5]
    L. Tang, H. Zhang, H. Xu, and J. Ma, "Deep learning-based image fusion: A survey, " J. Image Graph. , 2022.
    [6]
    Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes, " in Proc. IEEE Int. Conf. Intell. Rob. Syst., 2017, pp. 5108–5115.
    [7]
    X. Zhang, P. Ye, H. Leung, K. Gong, and G. Xiao, "Object fusion tracking based on visible and infrared images: A comprehensive review, " Inf. Fusion, vol. 63, pp. 166–187, 2020. doi: 10.1016/j.inffus.2020.05.002
    [8]
    M. Yin, J. Pang, Y. Wei, and P. Duan, "Image fusion algorithm based on nonsubsampled dual-tree complex contourlet transform and compressive sensing pulse coupled neural network, " J. Comput. Aided Des. Comput. Graph. , no. 3, pp. 411–419, 2016.
    [9]
    J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, "Infrared and visible image fusion based on target-enhanced multiscale transform decomposition, " Inf. Sci. , vol. 508, pp. 64–78, 2020. doi: 10.1016/j.ins.2019.08.066
    [10]
    H. Li, X. -J. Wu, and J. Kitler, "Mdlatlrr: A novel decomposition method for infrared and visible image fusion, " IEEE Trans. Image Process. , vol. 29, pp. 4733–4746, 2020. doi: 10.1109/TIP.2020.2975984
    [11]
    Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Image fusion with convolutional sparse representation, " IEEE Signal Process. Letters, vol. 23, no. 12, pp. 1882–1886, 2016. doi: 10.1109/LSP.2016.2618776
    [12]
    Z. Fu, X. Wang, J. Xu, N. Zhou, and Y. Zhao, "Infrared and visible images fusion based on rpca and nsct, " Infrared Phys. Technol. , vol. 77, pp. 114–123, 2016. doi: 10.1016/j.infrared.2016.05.012
    [13]
    J. Ma, Z. Zhou, B. Wang, and H. Zong, "Infrared and visible image fusion based on visual saliency map and weighted least square optimization, " Infrared Phys. Technol. , vol. 82, pp. 8–17, 2017. doi: 10.1016/j.infrared.2017.02.005
    [14]
    J. Ma, C. Chen, C. Li, and J. Huang, "Infrared and visible image fusion via gradient transfer and total variation minimization, " Inf. Fusion, vol. 31, pp. 100–109, 2016. doi: 10.1016/j.inffus.2016.02.001
    [15]
    W. Zhao, H. Lu, and D. Wang, "Multisensor image fusion and enhancement in spectral total variation domain, " IEEE Trans. Multimed. , vol. 20, no. 4, pp. 866–879, 2017.
    [16]
    H. Li and X. -J. Wu, "Densefuse: A fusion approach to infrared and visible images, " IEEE Trans. Image Process. , vol. 28, no. 5, pp. 2614–2623, 2019. doi: 10.1109/TIP.2018.2887342
    [17]
    H. Xu, H. Zhang, and J. Ma, "Classification saliency-based rule for visible and infrared image fusion, " IEEE Trans. Comput. Imaging, vol. 7, pp. 824–836, 2021. doi: 10.1109/TCI.2021.3100986
    [18]
    J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, "Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion, " IEEE Trans. Circuits Syst. Video Technol. , vol. 32, no. 1, pp. 105–119, 2022. doi: 10.1109/TCSVT.2021.3056725
    [19]
    F. Zhao, W. Zhao, L. Yao, and Y. Liu, "Self-supervised feature adaption for infrared and visible image fusion, " Inf. Fusion, vol. 76, pp. 189–203, 2021. doi: 10.1016/j.inffus.2021.06.002
    [20]
    J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, "Stdfusionnet: An infrared and visible image fusion network based on salient target detection, " IEEE Trans. Instrum. Meas. , vol. 70, pp. 1–13, 2021.
    [21]
    Y. Liu, Y. Shi, F. Mu, J. Cheng, C. Li, and X. Chen, "Multimodal mri volumetric data fusion with convolutional neural networks, " IEEE Trans. Instrum. Meas. , vol. 71, pp. 1–15, 2022.
    [22]
    L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network, " Inf. Fusion, vol. 82, pp. 28–42, 2022. doi: 10.1016/j.inffus.2021.12.004
    [23]
    J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "Fusiongan: A generative adversarial network for infrared and visible image fusion, " Inf. Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004
    [24]
    J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, "Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, " IEEE Trans. Instrum. Meas. , vol. 70, pp. 1–14, 2021.
    [25]
    J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, "Targetaware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5802–5811.
    [26]
    Y. Yang, J. Liu, S. Huang, W. Wan, W. Wen, and J. Guan, "Infrared and visible image fusion via texture conditional generative adversarial network, " IEEE Trans. Circuits Syst. Video Technol. , vol. 31, no. 12, pp. 4771–4783, 2021. doi: 10.1109/TCSVT.2021.3054584
    [27]
    J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, "Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686
    [28]
    W. Tang, F. He, and Y. Liu, "Ydtr: infrared and visible image fusion via y-shape dynamic transformer, " IEEE Trans. Multimed. , 2022.
    [29]
    J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, "Cgtf: Convolution-guided transformer for infrared and visible image fusion, " IEEE Trans. Instrum. Meas. , vol. 71, p. 5012314, 2022.
    [30]
    W. Di, L. Jinyuan, F. Xin, and R. Liu, "Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration, " in Int. Joint Conf. Artif. Intell., 2022.
    [31]
    H. Li, X. -J. Wu, and T. Durrani, "Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models, " IEEE Trans. Instrum. Meas. , vol. 69, no. 12, pp. 9645–9656, 2020. doi: 10.1109/TIM.2020.3005230
    [32]
    H. Li, X. -J. Wu, and J. Kittler, "Rfn-nest: An end-to-end residual fusion network for infrared and visible images, " Inf. Fusion, vol. 73, pp. 72–86, 2021. doi: 10.1016/j.inffus.2021.02.023
    [33]
    H. Xu, X. Wang, and J. Ma, "Drf: Disentangled representation for visible and infrared image fusion, " IEEE Trans. Instrum. Meas. , vol. 70, p. 5006713, 2021.
    [34]
    M. Xu, L. Tang, H. Zhang, and J. Ma, "Infrared and visible image fusion via parallel scene and texture learning, " Pattern Recognit. , vol. 132, p. 108929, 2022. doi: 10.1016/j.patcog.2022.108929
    [35]
    Z. Zhao, S. Xu, C. Zhang, J. Liu, J. Zhang, and P. Li, "Didfuse: deep image decomposition for infrared and visible image fusion, " in Int. Joint Conf. Artif. Intell., 2020, pp. 970–976.
    [36]
    F. Zhao and W. Zhao, "Learning specific and general realm feature representations for image fusion, " IEEE Trans. Multimed. , vol. 23, pp. 2745–2756, 2020.
    [37]
    L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "Piafusion: A progressive infrared and visible image fusion network based on illumination aware, " Inf. Fusion, vol. 83, pp. 79–92, 2022.
    [38]
    Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, "Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion, " Inf. Fusion, vol. 69, pp. 128–141, 2021. doi: 10.1016/j.inffus.2020.11.009
    [39]
    R. Liu, Z. Liu, J. Liu, and X. Fan, "Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion, " in Proc. ACM Int. Conf. Multimed., 2021, pp. 1600–1608.
    [40]
    H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. " in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12 797–12 804.
    [41]
    H. Zhang and J. Ma, "Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion, " Int. J. Comput. Vis. , vol. 129, no. 10, pp. 2761–2785, 2021. doi: 10.1007/s11263-021-01501-8
    [42]
    H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion, " in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12 484–12 491.
    [43]
    H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2fusion: A unified unsupervised image fusion network, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 44, no. 1, pp. 502–518, 2022. doi: 10.1109/TPAMI.2020.3012548
    [44]
    Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "Ifcnn: A general image fusion framework based on convolutional neural network, " Inf. Fusion, vol. 54, pp. 99–118, 2020. doi: 10.1016/j.inffus.2019.07.011
    [45]
    J. Liu, R. Dian, S. Li, and H. Liu, "Sgfusion: A saliency guided deeplearning framework for pixel-level image fusion, " Inf. Fusion, 2022.
    [46]
    Y. Fu, T. Xu, X. Wu, and J. Kittler, "Ppt fusion: Pyramid patch transformerfor a case study in image fusion, " arXiv, 2021.
    [47]
    V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, "Image fusion transformer, " arXiv, 2021.
    [48]
    H. Zhao and R. Nie, "Dndt: Infrared and visible image fusion via densenet and dual-transformer, " in Proc. Int. Conf. Inf. Technol. Biomed. Eng., 2021, pp. 71–75.
    [49]
    J. Ma, P. Liang, W. Yu, C. Chen, X. Guo, J. Wu, and J. Jiang, "Infrared and visible image fusion via detail preserving adversarial learning, " Inf. Fusion, vol. 54, pp. 85–98, 2020. doi: 10.1016/j.inffus.2019.07.005
    [50]
    J. Ma, H. Xu, J. Jiang, X. Mei, and X. -P. Zhang, "Ddcgan: A dualdiscriminator conditional generative adversarial network for multiresolution image fusion, " IEEE Trans. Image Process. , vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573
    [51]
    J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, "Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks, " IEEE Trans. Multimed. , vol. 23, pp. 1383–1396, 2020.
    [52]
    H. Zhou, W. Wu, Y. Zhang, J. Ma, and H. Ling, "Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network, " IEEE Trans. Multimed. , 2021.
    [53]
    H. Zhang, J. Yuan, X. Tian, and J. Ma, "Gan-fm: Infrared and visible image fusion using gan with full-scale skip connection and dual markovian discriminators, " IEEE Trans. Comput. Imaging, vol. 7, pp. 1134–1147, 2021. doi: 10.1109/TCI.2021.3119954
    [54]
    Y. Liu, Y. Shi, F. Mu, J. Cheng, and X. Chen, "Glioma segmentationoriented multi-modal mr image fusion with adversarial learning, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1528–1531, 2022. doi: 10.1109/JAS.2022.105770
    [55]
    Y. Liu, F. Mu, Y. Shi, and X. Chen, "Sf-net: A multi-task model for brain tumor segmentation in multimodal mri via image fusion, " IEEE Signal Process. Letters, vol. 29, pp. 1799–1803, 2022. doi: 10.1109/LSP.2022.3198594
    [56]
    H. Xu, J. Ma, J. Yuan, Z. Le, and W. Liu, "Rfnet: Unsupervised network for mutually reinforcing multi-modal image registration and fusion, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., June 2022, pp. 19 679–19 688.
    [57]
    J. Zhang and A. Rangarajan, "Affine image registration using a new information metric, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2004, pp. 1–8.
    [58]
    D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes, "Nonrigid registration using free-form deformations: application to breast mr images, " IEEE Trans. Med. Imaging, vol. 18, no. 8, pp. 712–721, 1999. doi: 10.1109/42.796284
    [59]
    J. -C. Yoo and T. H. Han, "Fast normalized cross-correlation, " Int. J Circuits, Syst. signal Process. , vol. 28, no. 6, pp. 819–843, 2009. doi: 10.1007/s00034-009-9130-7
    [60]
    F. Maes, D. Vandermeulen, and P. Suetens, "Medical image registration using mutual information, " Proc. IEEE, vol. 91, no. 10, pp. 1699–1722, 2003. doi: 10.1109/JPROC.2003.817864
    [61]
    C. Aguilera, F. Barrera, F. Lumbreras, A. D. Sappa, and R. Toledo, "Multispectral image feature points, " Sensors, vol. 12, no. 9, pp. 12 661–12 672, 2012. doi: 10.3390/s120912661
    [62]
    S. Kim, D. Min, B. Ham, M. N. Do, and K. Sohn, "Dasc: Robust dense descriptor for multi-modal and multi-spectral correspondence estimation, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 9, pp. 1712–1729, 2016.
    [63]
    J. Li, Q. Hu, and M. Ai, "Rift: Multi-modal image matching based on radiation-variation insensitive feature transform, " IEEE Trans. Image Process. , vol. 29, pp. 3296–3310, 2019.
    [64]
    G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca, "Voxelmorph: a learning framework for deformable medical image registration, " Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , vol. 38, no. 8, pp. 1788–1800, 2019.
    [65]
    M. Arar, Y. Ginger, D. Danon, A. H. Bermano, and D. Cohen-Or, "Unsupervised multi-modal image registration via geometry preserving image-to-image translation, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13 410–13 419.
    [66]
    S. Zhou, W. Tan, and B. Yan, "Promoting single-modal optical flow network for diverse cross-modal flow estimation, " in Proc. AAAI Conf. Artif. Intell., 2022, pp. 3562–3570.
    [67]
    S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, "A deep learning framework for remote sensing image registration, " ISPRS J. Photogramm. Remote Sens. , vol. 145, pp. 148–164, 2018. doi: 10.1016/j.isprsjprs.2017.12.012
    [68]
    K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation, " arXiv, 2014.
    [69]
    S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2874–2883.
    [70]
    X. Hu, C. -W. Fu, L. Zhu, J. Qin, and P. -A. Heng, "Direction-aware spatial context features for shadow detection and removal, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 42, no. 11, pp. 2795–2808, 2019.
    [71]
    T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. Lau, "Spatial attentive single-image deraining with a high quality real rain dataset, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12 270–12 279.
    [72]
    S. Meister, J. Hur, and S. Roth, "Unflow: Unsupervised learning of optical flow with a bidirectional census loss, " in Proc. AAAI Conf. Artif. Intell., 2018, pp. 7251–7259.
    [73]
    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity, " IEEE Trans. Image Process. , vol. 13, no. 4, pp. 600–612, 2004. doi: 10.1109/TIP.2003.819861
    [74]
    M. Berman, A. R. Triki, and M. B. Blaschko, "The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4413–4421.
    [75]
    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, high-performance deep learning library, " in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8026–8037.
    [76]
    P. Truong, M. Danelljan, and R. Timofte, "Glu-net: Global-local universal network for dense flow and correspondences, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6258–6268.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(6)

    Article Metrics

    Article views (9784) PDF downloads(365) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return