IEEE/CAA Journal of Automatica Sinica
Citation: | Q. M. Cheng, Y. Z. Zhou, H. Y. Huang, and Z. Y. Wang, “Multi-attention fusion and fine-grained alignment for bidirectional image-sentence retrieval in remote sensing,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1532–1535, Aug. 2022. doi: 10.1109/JAS.2022.105773 |
[1] |
H. Chen, G. Ding, X. Liu, Z. Lin, J. Liu, and J. Han, “IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2020, pp. 12652−12660.
|
[2] |
K. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proc. 15th European Conf. Computer Vision, Sep. 2018, pp. 201−216.
|
[3] |
T. Wang, X. Xu, Y. Yang, A. Hanjalic, H. Shen, and J. Song, “Matching images and text with multi-modal tensor fusion and re-ranking,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 12-20.
|
[4] |
Y. Wang, H. Yang, X. Qian, L. Ma, and X. Fan, “Position focused attention network for image-text matching,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, Aug. 2019, pp. 3792−3798.
|
[5] |
G. Wu, J. Han, Z. Lin, G. Ding, B. Zhang, and Q. Ni, “Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning,” IEEE Trans. Industrial Electronics, vol. 66, no. 12, pp. 9868–9877, Dec. 2019. doi: 10.1109/TIE.2018.2873547
|
[6] |
Abdullah, Ba zi, Rahhal A, et al, “TextRS: Deep bidirectional triplet network for matching text to remote sensing images,” Remote Sensing, vol. 12, no. 3, pp. 405–423, Jan. 2020. doi: 10.3390/rs12030405
|
[7] |
Q. Cheng, Y. Zhou, P. Fu, Y. Xu, and L. Zhang, “A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing,” IEEE J. Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4284–4297, Apr. 2021. doi: 10.1109/JSTARS.2021.3070872
|
[8] |
Y. Lv, W. Xiong, X. Zhang, and Y. Cui, “Fusion-based correlation learning model for cross-modal remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, Jan. 2022.
|
[9] |
Z. Yuan, W. Zhang, K. Fu, X. Li, C. Deng, H. Wang, and X. Sun, “Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 4404119, May. 2022.
|
[10] |
Z. Yuan, W. Zhang, X. Rong, X. Li, J. Chen, H. Wang, K. Fu, and X. Sun, “A lightweight multi-scale cross-modal text-image retrieval method in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 5612819, Apr. 2022.
|
[11] |
G. Mikriukov, M. Ravanbakhsh, and B. Demir. “Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing,” arXiv preprint arXiv: 2201.08125v1, Jan. 2022.
|
[12] |
Y. Chen, X. Lu, and S. Wang, “Deep cross-modal image-voice retrieval in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 58, no. 10, pp. 7049–7061, Oct. 2020. doi: 10.1109/TGRS.2020.2979273
|
[13] |
U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “Attention-driven cross-modal remote sensing image retrieval”, in Proc. IEEE Int. Geoscience and Remote Sensing Symposium, 2021, pp. 4783−4786.
|
[14] |
U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal retrieval in remote sensing,” Pattern Recognition Letters, vol. 131, no. 2, pp. 456–462, 2020.
|
[15] |
Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6521–6536, Nov. 2018. doi: 10.1109/TGRS.2018.2839705
|
[16] |
Y. Li, D. Kong, Y. Zhang, Y. Tan, and L. Chen, “Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification,” ISPRS J. Photogrammetry and Remote Sensing, vol. 179, pp. 145–158, 2021. doi: 10.1016/j.isprsjprs.2021.08.001
|
[17] |
X. Lu, B. Wang, X. Zheng, and X. Li, “Exploring models and data for remote sensing image caption generation,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, Apr. 2018. doi: 10.1109/TGRS.2017.2776321
|
[18] |
B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” in Proc. Int. Conf. Computer, Inform. and Telecomm. Syst., pp. 124−128, Jul. 2016.
|
[19] |
H. J. Hu, H. S. Wang, Z. Liu, and W. D. Chen, “Domain-invariant similarity activation map contrastive learning for retrieval-based long-term visual localization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 313–328, Feb. 2022. doi: 10.1109/JAS.2021.1003907
|