A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 7
Jul.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Jiayi Ma, Linfeng Tang, Fan Fan, Jun Huang, Xiaoguang Mei, and Yong Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200-1217, Jul. 2022. doi: 10.1109/JAS.2022.105686
Citation: Jiayi Ma, Linfeng Tang, Fan Fan, Jun Huang, Xiaoguang Mei, and Yong Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200-1217, Jul. 2022. doi: 10.1109/JAS.2022.105686

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer

doi: 10.1109/JAS.2022.105686
Funds:

the National Natural Science Foundation of China 62075169

the National Natural Science Foundation of China 62003247

the National Natural Science Foundation of China 62061160370

the Key Research and Development Program of Hubei Province 2020BAB113

More Information
  • This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

     

  • loading
  • Recommended by Associate Editor Qing-Long Han.
  • [1]
    H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective," Information Fusion, vol. 76, pp. 323–336, 2021. doi: 10.1016/j.inffus.2021.06.008
    [2]
    J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, "Stdfusionnet: An infrared and visible image fusion network based on salient target detection," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5009513, 2021.
    [3]
    M. Awad, A. Elliethy, and H. A. Aly, "Adaptive near-infrared and visible fusion for fast image enhancement," IEEE Transactions on Computational Imaging, vol. 6, pp. 408–418, 2019.
    [4]
    H. Xu and J. Ma, "Emfusion: An unsupervised enhanced medical image fusion network," Information Fusion, vol. 76, pp. 177–186, 2021. doi: 10.1016/j.inffus.2021.06.001
    [5]
    Y. Liu, X. Chen, J. Cheng, and H. Peng, "A medical image fusion method based on convolutional neural networks," in Proceedings of the International Conference on Information Fusion, 2017, pp. 1–7.
    [6]
    J. Ma, Z. Le, X. Tian, and J. Jiang, "Smfuse: Multi-focus image fusion via self-supervised mask-optimization," IEEE Transactions on Computational Imaging, vol. 7, pp. 309–320, 2021. doi: 10.1109/TCI.2021.3063872
    [7]
    L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28–42, 2022. doi: 10.1016/j.inffus.2021.12.004
    [8]
    G. Bhatnagar and Q. J. Wu, "A fractal dimension based framework for night vision fusion," IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 220–227, 2018. doi: 10.1109/JAS.2018.7511102
    [9]
    Y. Zhang and Q. Ji, "Active and dynamic information fusion for facial expression understanding from image sequences," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 699–714, 2005. doi: 10.1109/TPAMI.2005.93
    [10]
    H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2fusion: A unified unsupervised image fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2022. doi: 10.1109/TPAMI.2020.3012548
    [11]
    H. Xu, X. Wang, and J. Ma, "Drf: Disentangled representation for visible and infrared image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5006713, 2021.
    [12]
    Q. Han and C. Jung, "Deep selective fusion of visible and near-infrared images using unsupervised u-net," IEEE Transactions on Neural Networks and Learning Systems, 2022.
    [13]
    K. Ram Prabhakar, V. Sai Srikar, and R. Venkatesh Babu, "Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4714–4722.
    [14]
    Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
    [15]
    Y. Liu, S. Liu, and Z. Wang, "A general framework for image fusion based on multi-scale transform and sparse representation," Information Fusion, vol. 24, pp. 147–164, 2015. doi: 10.1016/j.inffus.2014.09.004
    [16]
    H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity." in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 797–12 804.
    [17]
    J. Ma, C. Chen, C. Li, and J. Huang, "Infrared and visible image fusion via gradient transfer and total variation minimization," Information Fusion, vol. 31, pp. 100–109, 2016. doi: 10.1016/j.inffus.2016.02.001
    [18]
    H. Li, X. -J. Wu, and J. Kitler, "Mdlatlrr: A novel decomposition method for infrared and visible image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4733–4746, 2020. doi: 10.1109/TIP.2020.2975984
    [19]
    W. Zhao, H. Lu, and D. Wang, "Multisensor image fusion and enhancement in spectral total variation domain," IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 866–879, 2017.
    [20]
    Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "Ifcnn: A general image fusion framework based on convolutional neural network," Information Fusion, vol. 54, pp. 99–118, 2020. doi: 10.1016/j.inffus.2019.07.011
    [21]
    F. Zhao, W. Zhao, L. Yao, and Y. Liu, "Self-supervised feature adaption for infrared and visible image fusion," Information Fusion, vol. 76, pp. 189–203, 2021. doi: 10.1016/j.inffus.2021.06.002
    [22]
    H. Li and X. -J. Wu, "Densefuse: A fusion approach to infrared and visible images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2019. doi: 10.1109/TIP.2018.2887342
    [23]
    H. Xu, H. Zhang, and J. Ma, "Classification saliency-based rule for visible and infrared image fusion," IEEE Transactions on Computational Imaging, vol. 7, pp. 824–836, 2021. doi: 10.1109/TCI.2021.3100986
    [24]
    J. Ma, H. Xu, J. Jiang, X. Mei, and X. -P. Zhang, "Ddcgan: A dualdiscriminator conditional generative adversarial network for multiresolution image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573
    [25]
    H. Xu, J. Ma, and X. -P. Zhang, "Mef-gan: Multi-exposure image fusion via generative adversarial networks," IEEE Transactions on Image Processing, vol. 29, pp. 7203–7216, 2020. doi: 10.1109/TIP.2020.2999855
    [26]
    J. Huang, Z. Le, Y. Ma, F. Fan, H. Zhang, and L. Yang, "Mgmdcgan: Medical image fusion using multi-generator multi-discriminator conditional generative adversarial network," IEEE Access, vol. 8, pp. 55 145–55 157, 2020. doi: 10.1109/ACCESS.2020.2982016
    [27]
    J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, "Infrared and visible image fusion based on target-enhanced multiscale transform decomposition," Information Sciences, vol. 508, pp. 64–78, 2020. doi: 10.1016/j.ins.2019.08.066
    [28]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 1–11.
    [29]
    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–12.
    [30]
    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.
    [31]
    S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881–6890.
    [32]
    H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, "Pre-trained image processing transformer," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
    [33]
    J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, "Swinir: Image restoration using swin transformer," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
    [34]
    L. Qu, S. Liu, M. Wang, and Z. Song, "Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning," arXiv preprint arXiv: 2112.01030, 2021.
    [35]
    V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, "Image fusion transformer," arXiv preprint arXiv: 2107.09011, 2021.
    [36]
    D. Rao, X. -J. Wu, and T. Xu, "Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network," arXiv preprint arXiv: 2201.10147, 2022.
    [37]
    Y. Fu, T. Xu, X. Wu, and J. Kittler, "Ppt fusion: Pyramid patch transformerfor a case study in image fusion," arXiv preprint arXiv: 2107.13967, 2021.
    [38]
    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
    [39]
    Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Medical image fusion via convolutional sparsity based morphological component analysis," IEEE Signal Processing Letters, vol. 26, no. 3, pp. 485–489, 2019. doi: 10.1109/LSP.2019.2895749
    [40]
    Y. Liu, S. Liu, and Z. Wang, "Multi-focus image fusion with dense sift," Information Fusion, vol. 23, pp. 139–155, 2015. doi: 10.1016/j.inffus.2014.05.004
    [41]
    K. Ma, H. Li, H. Yong, Z. Wang, D. Meng, and L. Zhang, "Robust multi-exposure image fusion: a structural patch decomposition approach," IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2519–2532, 2017. doi: 10.1109/TIP.2017.2671921
    [42]
    H. Li, L. Li, and J. Zhang, "Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering," Optics Communications, vol. 342, pp. 1–11, 2015. doi: 10.1016/j.optcom.2014.12.048
    [43]
    H. Li, X. -J. Wu, and J. Kittler, "Infrared and visible image fusion using a deep learning framework," in Proceedings of the International Conference on Pattern Recognition, 2018, pp. 2705–2710.
    [44]
    Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
    [45]
    H. Ma, Q. Liao, J. Zhang, S. Liu, and J. -H. Xue, "An α-matte boundary defocus model-based cascaded network for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 8668–8679, 2020. doi: 10.1109/TIP.2020.3018261
    [46]
    J. Li, X. Guo, G. Lu, B. Zhang, Y. Xu, F. Wu, and D. Zhang, "Drpl: Deep regression pair learning for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4816–4831, 2020. doi: 10.1109/TIP.2020.2976190
    [47]
    F. Zhao, W. Zhao, H. Lu, Y. Liu, L. Yao, and Y. Liu, "Depth-distilled multi-focus image fusion," IEEE Transactions on Multimedia, 2021.
    [48]
    W. Zhao, B. Zheng, Q. Lin, and H. Lu, "Enhancing diversity of defocus blur detectors via cross-ensemble network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8905–8913.
    [49]
    W. Zhao, X. Hou, Y. He, and H. Lu, "Defocus blur detection via boosting diversity of deep ensemble networks," IEEE Transactions on Image Processing, vol. 30, pp. 5426–5438, 2021. doi: 10.1109/TIP.2021.3084101
    [50]
    D. Han, L. Li, X. Guo, and J. Ma, "Multi-exposure image fusion via deep perceptual enhancement," Information Fusion, vol. 79, pp. 248–262, 2022. doi: 10.1016/j.inffus.2021.10.006
    [51]
    Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, "Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion," Information Fusion, vol. 69, pp. 128–141, 2021. doi: 10.1016/j.inffus.2020.11.009
    [52]
    H. Li, X. -J. Wu, and T. Durrani, "Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645–9656, 2020. doi: 10.1109/TIM.2020.3005230
    [53]
    H. Li, X. -J. Wu, and J. Kittler, "Rfn-nest: An end-to-end residual fusion network for infrared and visible images," Information Fusion, vol. 73, pp. 720–86, 2021.
    [54]
    L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, "Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 20151911, 2021.
    [55]
    J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "Fusiongan: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004
    [56]
    H. Zhang, Z. Le, Z. Shao, H. Xu, and J. Ma, "Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion," Information Fusion, vol. 66, pp. 40–53, 2021. doi: 10.1016/j.inffus.2020.08.022
    [57]
    J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, "Pan-gan: An unsupervised pan-sharpening method for remote sensing image fusion," Information Fusion, vol. 62, pp. 110–120, 2020. doi: 10.1016/j.inffus.2020.04.006
    [58]
    J. Li, H. Huo, C. Li, R. Wang, C. Sui, and Z. Liu, "Multigrained attention network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5002412, 2021.
    [59]
    J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, "Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks," IEEE Transactions on Multimedia, vol. 23, pp. 1383–1396, 2020.
    [60]
    H. Zhang and J. Ma, "Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion," International Journal of Computer Vision, vol. 129, no. 10, pp. 2761–2785, 2021. doi: 10.1007/s11263-021-01501-8
    [61]
    F. Zhao and W. Zhao, "Learning specific and general realm feature representations for image fusion," IEEE Transactions on Multimedia, vol. 23, pp. 2745–2756, 2020.
    [62]
    H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 484–12 491.
    [63]
    A. Srinivas, T. -Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, "Bottleneck transformers for visual recognition," in Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, 2021, pp. 16 519–16 529.
    [64]
    M. Chen, H. Peng, J. Fu, and H. Ling, "Autoformer: Searching transformers for visual recognition," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 12 270–12 280.
    [65]
    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–11.
    [66]
    Z. Sun, S. Cao, Y. Yang, and K. M. Kitani, "Rethinking transformerbased set prediction for object detection," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 3611–3620.
    [67]
    P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "Transtrack: Multiple object tracking with transformer," arXiv preprint arXiv: 2012.15460, 2020.
    [68]
    X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, "Transformer tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8126–8135.
    [69]
    L. Lin, H. Fan, Y. Xu, and H. Ling, "Swintrack: A simple and strong baseline for transformer tracking," arXiv preprint arXiv: 2112.00995, 2021.
    [70]
    H. Wang, Y. Zhu, H. Adam, A. Yuille, and L. -C. Chen, "Maxdeeplab: End-to-end panoptic segmentation with mask transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5463–5474.
    [71]
    F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, "Learning texture transformer network for image super-resolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5791–5800.
    [72]
    H. Zhao and R. Nie, "Dndt: Infrared and visible image fusion via densenet and dual-transformer," in Proceedings of the International Conference on Information Technology and Biomedical Engineering, 2021, pp. 71–75.
    [73]
    J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, "Cgtf: Convolution-guided transformer for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 71, p. 5012314, 2022.
    [74]
    T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick, "Early convolutions help transformers see better," in Advances in Neural Information Processing Systems, 2021, pp. 30 392–30 400.
    [75]
    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. doi: 10.1109/TIP.2003.819861
    [76]
    L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "Piafusion: A progressive infrared and visible image fusion network based on illumination aware," Information Fusion, vol. 83, pp. 79–92, 2022.
    [77]
    Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes," in Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2017, pp. 5108–5115.
    [78]
    M. Brown and S. Süsstrunk, "Multi-spectral sift for scene category recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177–184.
    [79]
    J. Cai, S. Gu, and L. Zhang, "Learning a deep single image contrast enhancer from multi-exposure images," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018. doi: 10.1109/TIP.2018.2794218
    [80]
    X. Zhang, "Benchmarking and comparing multi-exposure image fusion algorithms," Information Fusion, vol. 74, pp. 111–131, 2021. doi: 10.1016/j.inffus.2021.02.005
    [81]
    M. Nejati, S. Samavi, and S. Shirani, "Multi-focus image fusion using dictionary-based sparse representation," Information Fusion, vol. 25, pp. 72–84, 2015. doi: 10.1016/j.inffus.2014.10.004
    [82]
    J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, "Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5005014, 2021.
    [83]
    K. Ma, Z. Duanmu, H. Zhu, Y. Fang, and Z. Wang, "Deep guided learning for fast multi-exposure image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 2808–2819, 2020. doi: 10.1109/TIP.2019.2952716
    [84]
    M. B. A. Haghighat, A. Aghagolzadeh, and H. Seyedarabi, "A nonreference image fusion metric based on mutual information of image features," Computers & Electrical Engineering, vol. 37, no. 5, pp. 744–756, 2011.
    [85]
    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, high-performance deep learning library," in Advances in Neural Information Processing Systems, 2019, pp. 8026–8037.
    [86]
    C. Peng, T. Tian, C. Chen, X. Guo, and J. Ma, "Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation," Neural Networks, vol. 137, pp. 188–199, 2021. doi: 10.1016/j.neunet.2021.01.021
    [87]
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
    [88]
    H. Zhang, J. Yuan, X. Tian, and J. Ma, "GAN-FM: Infrared and visible image fusion using gan with full-scale skip connection and dual Markovian discriminators," IEEE Transactions on Computational Imaging, vol. 7, pp. 1134–1147, 2021. doi: 10.1109/TCI.2021.3119954
    [89]
    S. F. Bhat, I. Alhashim, and P. Wonka, "Adabins: Depth estimation using adaptive bins," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4009–4018.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(17)  / Tables(6)

    Article Metrics

    Article views (3015) PDF downloads(910) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return