IEEE/CAA Journal of Automatica Sinica
Citation:  L. F. Tang, Y. X. Deng, Y. Ma, J. Huang, and J. Y. Ma, "SuperFusion: A versatile image registration and fusion network with semantic awareness, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp.20212137, Dec. 2022. doi: 10.1109/JAS.2022.106082 
[1] 
X. Zhang, "Deep learningbased multifocus image fusion: A survey and a comparative study, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 44, no. 9, pp. 4819–4838, 2022.

[2] 
J. Ma, Y. Ma, and C. Li, "Infrared and visible image fusion methods and applications: A survey, " Inf. Fusion, vol. 45, pp. 153–178, 2019. doi: 10.1016/j.inffus.2018.02.004

[3] 
X. Zhang, P. Ye, and G. Xiao, "Vifb: A visible and infrared image fusion benchmark, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 104–105.

[4] 
H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective, " Inf. Fusion, vol. 76, pp. 323–336, 2021. doi: 10.1016/j.inffus.2021.06.008

[5] 
L. Tang, H. Zhang, H. Xu, and J. Ma, "Deep learningbased image fusion: A survey, " J. Image Graph. , 2022.

[6] 
Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "Mfnet: Towards realtime semantic segmentation for autonomous vehicles with multispectral scenes, " in Proc. IEEE Int. Conf. Intell. Rob. Syst., 2017, pp. 5108–5115.

[7] 
X. Zhang, P. Ye, H. Leung, K. Gong, and G. Xiao, "Object fusion tracking based on visible and infrared images: A comprehensive review, " Inf. Fusion, vol. 63, pp. 166–187, 2020. doi: 10.1016/j.inffus.2020.05.002

[8] 
M. Yin, J. Pang, Y. Wei, and P. Duan, "Image fusion algorithm based on nonsubsampled dualtree complex contourlet transform and compressive sensing pulse coupled neural network, " J. Comput. Aided Des. Comput. Graph. , no. 3, pp. 411–419, 2016.

[9] 
J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, "Infrared and visible image fusion based on targetenhanced multiscale transform decomposition, " Inf. Sci. , vol. 508, pp. 64–78, 2020. doi: 10.1016/j.ins.2019.08.066

[10] 
H. Li, X. J. Wu, and J. Kitler, "Mdlatlrr: A novel decomposition method for infrared and visible image fusion, " IEEE Trans. Image Process. , vol. 29, pp. 4733–4746, 2020. doi: 10.1109/TIP.2020.2975984

[11] 
Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Image fusion with convolutional sparse representation, " IEEE Signal Process. Letters, vol. 23, no. 12, pp. 1882–1886, 2016. doi: 10.1109/LSP.2016.2618776

[12] 
Z. Fu, X. Wang, J. Xu, N. Zhou, and Y. Zhao, "Infrared and visible images fusion based on rpca and nsct, " Infrared Phys. Technol. , vol. 77, pp. 114–123, 2016. doi: 10.1016/j.infrared.2016.05.012

[13] 
J. Ma, Z. Zhou, B. Wang, and H. Zong, "Infrared and visible image fusion based on visual saliency map and weighted least square optimization, " Infrared Phys. Technol. , vol. 82, pp. 8–17, 2017. doi: 10.1016/j.infrared.2017.02.005

[14] 
J. Ma, C. Chen, C. Li, and J. Huang, "Infrared and visible image fusion via gradient transfer and total variation minimization, " Inf. Fusion, vol. 31, pp. 100–109, 2016. doi: 10.1016/j.inffus.2016.02.001

[15] 
W. Zhao, H. Lu, and D. Wang, "Multisensor image fusion and enhancement in spectral total variation domain, " IEEE Trans. Multimed. , vol. 20, no. 4, pp. 866–879, 2017.

[16] 
H. Li and X. J. Wu, "Densefuse: A fusion approach to infrared and visible images, " IEEE Trans. Image Process. , vol. 28, no. 5, pp. 2614–2623, 2019. doi: 10.1109/TIP.2018.2887342

[17] 
H. Xu, H. Zhang, and J. Ma, "Classification saliencybased rule for visible and infrared image fusion, " IEEE Trans. Comput. Imaging, vol. 7, pp. 824–836, 2021. doi: 10.1109/TCI.2021.3100986

[18] 
J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, "Learning a deep multiscale feature ensemble and an edgeattention guidance for image fusion, " IEEE Trans. Circuits Syst. Video Technol. , vol. 32, no. 1, pp. 105–119, 2022. doi: 10.1109/TCSVT.2021.3056725

[19] 
F. Zhao, W. Zhao, L. Yao, and Y. Liu, "Selfsupervised feature adaption for infrared and visible image fusion, " Inf. Fusion, vol. 76, pp. 189–203, 2021. doi: 10.1016/j.inffus.2021.06.002

[20] 
J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, "Stdfusionnet: An infrared and visible image fusion network based on salient target detection, " IEEE Trans. Instrum. Meas. , vol. 70, pp. 1–13, 2021.

[21] 
Y. Liu, Y. Shi, F. Mu, J. Cheng, C. Li, and X. Chen, "Multimodal mri volumetric data fusion with convolutional neural networks, " IEEE Trans. Instrum. Meas. , vol. 71, pp. 1–15, 2022.

[22] 
L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of highlevel vision tasks: A semanticaware realtime infrared and visible image fusion network, " Inf. Fusion, vol. 82, pp. 28–42, 2022. doi: 10.1016/j.inffus.2021.12.004

[23] 
J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "Fusiongan: A generative adversarial network for infrared and visible image fusion, " Inf. Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004

[24] 
J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, "Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, " IEEE Trans. Instrum. Meas. , vol. 70, pp. 1–14, 2021.

[25] 
J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, "Targetaware dual adversarial learning and a multiscenario multimodality benchmark to fuse infrared and visible for object detection, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5802–5811.

[26] 
Y. Yang, J. Liu, S. Huang, W. Wan, W. Wen, and J. Guan, "Infrared and visible image fusion via texture conditional generative adversarial network, " IEEE Trans. Circuits Syst. Video Technol. , vol. 31, no. 12, pp. 4771–4783, 2021. doi: 10.1109/TCSVT.2021.3054584

[27] 
J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, "Swinfusion: Crossdomain longrange learning for general image fusion via swin transformer, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686

[28] 
W. Tang, F. He, and Y. Liu, "Ydtr: infrared and visible image fusion via yshape dynamic transformer, " IEEE Trans. Multimed. , 2022.

[29] 
J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, "Cgtf: Convolutionguided transformer for infrared and visible image fusion, " IEEE Trans. Instrum. Meas. , vol. 71, p. 5012314, 2022.

[30] 
W. Di, L. Jinyuan, F. Xin, and R. Liu, "Unsupervised misaligned infrared and visible image fusion via crossmodality image generation and registration, " in Int. Joint Conf. Artif. Intell., 2022.

[31] 
H. Li, X. J. Wu, and T. Durrani, "Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models, " IEEE Trans. Instrum. Meas. , vol. 69, no. 12, pp. 9645–9656, 2020. doi: 10.1109/TIM.2020.3005230

[32] 
H. Li, X. J. Wu, and J. Kittler, "Rfnnest: An endtoend residual fusion network for infrared and visible images, " Inf. Fusion, vol. 73, pp. 72–86, 2021. doi: 10.1016/j.inffus.2021.02.023

[33] 
H. Xu, X. Wang, and J. Ma, "Drf: Disentangled representation for visible and infrared image fusion, " IEEE Trans. Instrum. Meas. , vol. 70, p. 5006713, 2021.

[34] 
M. Xu, L. Tang, H. Zhang, and J. Ma, "Infrared and visible image fusion via parallel scene and texture learning, " Pattern Recognit. , vol. 132, p. 108929, 2022. doi: 10.1016/j.patcog.2022.108929

[35] 
Z. Zhao, S. Xu, C. Zhang, J. Liu, J. Zhang, and P. Li, "Didfuse: deep image decomposition for infrared and visible image fusion, " in Int. Joint Conf. Artif. Intell., 2020, pp. 970–976.

[36] 
F. Zhao and W. Zhao, "Learning specific and general realm feature representations for image fusion, " IEEE Trans. Multimed. , vol. 23, pp. 2745–2756, 2020.

[37] 
L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "Piafusion: A progressive infrared and visible image fusion network based on illumination aware, " Inf. Fusion, vol. 83, pp. 79–92, 2022.

[38] 
Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, "Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion, " Inf. Fusion, vol. 69, pp. 128–141, 2021. doi: 10.1016/j.inffus.2020.11.009

[39] 
R. Liu, Z. Liu, J. Liu, and X. Fan, "Searching a hierarchically aggregated fusion architecture for fast multimodality image fusion, " in Proc. ACM Int. Conf. Multimed., 2021, pp. 1600–1608.

[40] 
H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. " in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12 797–12 804.

[41] 
H. Zhang and J. Ma, "Sdnet: A versatile squeezeanddecomposition network for realtime image fusion, " Int. J. Comput. Vis. , vol. 129, no. 10, pp. 2761–2785, 2021. doi: 10.1007/s11263021015018

[42] 
H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion, " in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12 484–12 491.

[43] 
H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2fusion: A unified unsupervised image fusion network, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 44, no. 1, pp. 502–518, 2022. doi: 10.1109/TPAMI.2020.3012548

[44] 
Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "Ifcnn: A general image fusion framework based on convolutional neural network, " Inf. Fusion, vol. 54, pp. 99–118, 2020. doi: 10.1016/j.inffus.2019.07.011

[45] 
J. Liu, R. Dian, S. Li, and H. Liu, "Sgfusion: A saliency guided deeplearning framework for pixellevel image fusion, " Inf. Fusion, 2022.

[46] 
Y. Fu, T. Xu, X. Wu, and J. Kittler, "Ppt fusion: Pyramid patch transformerfor a case study in image fusion, " arXiv, 2021.

[47] 
V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, "Image fusion transformer, " arXiv, 2021.

[48] 
H. Zhao and R. Nie, "Dndt: Infrared and visible image fusion via densenet and dualtransformer, " in Proc. Int. Conf. Inf. Technol. Biomed. Eng., 2021, pp. 71–75.

[49] 
J. Ma, P. Liang, W. Yu, C. Chen, X. Guo, J. Wu, and J. Jiang, "Infrared and visible image fusion via detail preserving adversarial learning, " Inf. Fusion, vol. 54, pp. 85–98, 2020. doi: 10.1016/j.inffus.2019.07.005

[50] 
J. Ma, H. Xu, J. Jiang, X. Mei, and X. P. Zhang, "Ddcgan: A dualdiscriminator conditional generative adversarial network for multiresolution image fusion, " IEEE Trans. Image Process. , vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573

[51] 
J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, "Attentionfgan: Infrared and visible image fusion using attentionbased generative adversarial networks, " IEEE Trans. Multimed. , vol. 23, pp. 1383–1396, 2020.

[52] 
H. Zhou, W. Wu, Y. Zhang, J. Ma, and H. Ling, "Semanticsupervised infrared and visible image fusion via a dualdiscriminator generative adversarial network, " IEEE Trans. Multimed. , 2021.

[53] 
H. Zhang, J. Yuan, X. Tian, and J. Ma, "Ganfm: Infrared and visible image fusion using gan with fullscale skip connection and dual markovian discriminators, " IEEE Trans. Comput. Imaging, vol. 7, pp. 1134–1147, 2021. doi: 10.1109/TCI.2021.3119954

[54] 
Y. Liu, Y. Shi, F. Mu, J. Cheng, and X. Chen, "Glioma segmentationoriented multimodal mr image fusion with adversarial learning, " IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1528–1531, 2022. doi: 10.1109/JAS.2022.105770

[55] 
Y. Liu, F. Mu, Y. Shi, and X. Chen, "Sfnet: A multitask model for brain tumor segmentation in multimodal mri via image fusion, " IEEE Signal Process. Letters, vol. 29, pp. 1799–1803, 2022. doi: 10.1109/LSP.2022.3198594

[56] 
H. Xu, J. Ma, J. Yuan, Z. Le, and W. Liu, "Rfnet: Unsupervised network for mutually reinforcing multimodal image registration and fusion, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., June 2022, pp. 19 679–19 688.

[57] 
J. Zhang and A. Rangarajan, "Affine image registration using a new information metric, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2004, pp. 1–8.

[58] 
D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes, "Nonrigid registration using freeform deformations: application to breast mr images, " IEEE Trans. Med. Imaging, vol. 18, no. 8, pp. 712–721, 1999. doi: 10.1109/42.796284

[59] 
J. C. Yoo and T. H. Han, "Fast normalized crosscorrelation, " Int. J Circuits, Syst. signal Process. , vol. 28, no. 6, pp. 819–843, 2009. doi: 10.1007/s0003400991307

[60] 
F. Maes, D. Vandermeulen, and P. Suetens, "Medical image registration using mutual information, " Proc. IEEE, vol. 91, no. 10, pp. 1699–1722, 2003. doi: 10.1109/JPROC.2003.817864

[61] 
C. Aguilera, F. Barrera, F. Lumbreras, A. D. Sappa, and R. Toledo, "Multispectral image feature points, " Sensors, vol. 12, no. 9, pp. 12 661–12 672, 2012. doi: 10.3390/s120912661

[62] 
S. Kim, D. Min, B. Ham, M. N. Do, and K. Sohn, "Dasc: Robust dense descriptor for multimodal and multispectral correspondence estimation, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 9, pp. 1712–1729, 2016.

[63] 
J. Li, Q. Hu, and M. Ai, "Rift: Multimodal image matching based on radiationvariation insensitive feature transform, " IEEE Trans. Image Process. , vol. 29, pp. 3296–3310, 2019.

[64] 
G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca, "Voxelmorph: a learning framework for deformable medical image registration, " Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , vol. 38, no. 8, pp. 1788–1800, 2019.

[65] 
M. Arar, Y. Ginger, D. Danon, A. H. Bermano, and D. CohenOr, "Unsupervised multimodal image registration via geometry preserving imagetoimage translation, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13 410–13 419.

[66] 
S. Zhou, W. Tan, and B. Yan, "Promoting singlemodal optical flow network for diverse crossmodal flow estimation, " in Proc. AAAI Conf. Artif. Intell., 2022, pp. 3562–3570.

[67] 
S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, "A deep learning framework for remote sensing image registration, " ISPRS J. Photogramm. Remote Sens. , vol. 145, pp. 148–164, 2018. doi: 10.1016/j.isprsjprs.2017.12.012

[68] 
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoderdecoder for statistical machine translation, " arXiv, 2014.

[69] 
S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, "Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2874–2883.

[70] 
X. Hu, C. W. Fu, L. Zhu, J. Qin, and P. A. Heng, "Directionaware spatial context features for shadow detection and removal, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 42, no. 11, pp. 2795–2808, 2019.

[71] 
T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. Lau, "Spatial attentive singleimage deraining with a high quality real rain dataset, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12 270–12 279.

[72] 
S. Meister, J. Hur, and S. Roth, "Unflow: Unsupervised learning of optical flow with a bidirectional census loss, " in Proc. AAAI Conf. Artif. Intell., 2018, pp. 7251–7259.

[73] 
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity, " IEEE Trans. Image Process. , vol. 13, no. 4, pp. 600–612, 2004. doi: 10.1109/TIP.2003.819861

[74] 
M. Berman, A. R. Triki, and M. B. Blaschko, "The lovászsoftmax loss: A tractable surrogate for the optimization of the intersectionoverunion measure in neural networks, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4413–4421.

[75] 
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, highperformance deep learning library, " in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8026–8037.

[76] 
P. Truong, M. Danelljan, and R. Timofte, "Glunet: Globallocal universal network for dense flow and correspondences, " in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6258–6268.
