A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 2
Feb.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
J. Y. Ma, K. N. Zhang, and J. J. Jiang, “Loop closure detection via locality preserving matching with global consensus,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 411–426, Feb. 2023. doi: 10.1109/JAS.2022.105926
Citation: J. Y. Ma, K. N. Zhang, and J. J. Jiang, “Loop closure detection via locality preserving matching with global consensus,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 411–426, Feb. 2023. doi: 10.1109/JAS.2022.105926

Loop Closure Detection via Locality Preserving Matching With Global Consensus

doi: 10.1109/JAS.2022.105926
Funds:  This work was supported by the Key Research and Development Program of Hubei Province (2020BAB113)
More Information
  • A critical component of visual simultaneous localization and mapping is loop closure detection (LCD), an operation judging whether a robot has come to a pre-visited area. Concretely, given a query image (i.e., the latest view observed by the robot), it proceeds by first exploring images with similar semantic information, followed by solving the relative relationship between candidate pairs in the 3D space. In this work, a novel appearance-based LCD system is proposed. Specifically, candidate frame selection is conducted via the combination of Super-features and aggregated selective match kernel (ASMK). We incorporate an incremental strategy into the vanilla ASMK to make it applied in the LCD task. It is demonstrated that this setting is memory-wise efficient and can achieve remarkable performance. To dig up consistent geometry between image pairs during loop closure verification, we propose a simple yet surprisingly effective feature matching algorithm, termed locality preserving matching with global consensus (LPM-GC). The major objective of LPM-GC is to retain the local neighborhood information of true feature correspondences between candidate pairs, where a global constraint is further designed to effectively remove false correspondences in challenging sceneries, e.g., containing numerous repetitive structures. Meanwhile, we derive a closed-form solution that enables our approach to provide reliable correspondences within only a few milliseconds. The performance of the proposed approach has been experimentally evaluated on ten publicly available and challenging datasets. Results show that our method can achieve better performance over the state-of-the-art in both feature matching and LCD tasks. We have released our code of LPM-GC at https://github.com/jiayi-ma/LPM-GC.

     

  • loading
  • 1 In the introduction of “Super-features”, the term “query” refers to the input of the Transformer architecture, which has different meaning with that in image retrieval and LCD.
    2 https://github.com/jenicek/asmk
    3 https://github.com/axelBarroso/Key.Net-Pytorch
  • [1]
    C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016. doi: 10.1109/TRO.2016.2624754
    [2]
    W. Huang, G. Zhang, and X. Han, “Dense mapping from an accurate tracking SLAM,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1565–1574, 2020. doi: 10.1109/JAS.2020.1003357
    [3]
    S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Trans. Robot., vol. 32, no. 1, pp. 1–19, 2015.
    [4]
    D. Gálvez-López and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012. doi: 10.1109/TRO.2012.2197158
    [5]
    L. Bampis, A. Amanatiadis, and A. Gasteratos, “Encoding the description of image sequences: A two-layered pipeline for loop closure detection,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2016, pp. 4530–4536.
    [6]
    K. Zhang, X. Jiang, and J. Ma, “Appearance-based loop closure detection via locality-driven accurate motion field learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2350–2365, 2022. doi: 10.1109/TITS.2021.3086822
    [7]
    R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 5297–5307.
    [8]
    Y. Xu, J. Huang, J. Wang, Y. Wang, H. Qin, and K. Nan, “ESA-VLAD: A lightweight network based on second-order attention and netvlad for loop closure detection,” IEEE Robot. Autom. Lett., vol. 6, no. 4, pp. 6545–6552, 2021. doi: 10.1109/LRA.2021.3094228
    [9]
    S. An, G. Che, F. Zhou, X. Liu, X. Ma, and Y. Chen, “Fast and incremental loop closure detection using proximity graphs,” arXiv preprint arXiv: 1911.10752, 2019.
    [10]
    S. An, H. Zhu, D. Wei, K. A. Tsintotas, and A. Gasteratos, “Fast and incremental loop closure detection with deep features and proximity graphs,” J. Field Robot., vol. 39, no. 4, pp. 473–493, 2022. doi: 10.1002/rob.22060
    [11]
    J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Comput. Vis., 2003, pp. 1–8.
    [12]
    J. MacQueen, et al., “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Statist. Probab., 1967, pp. 281–297.
    [13]
    F. Perronnin and C. Dance, “Fisher kernels on visual vocabularies for image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.
    [14]
    H. Jégou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 316–336, 2010. doi: 10.1007/s11263-009-0285-2
    [15]
    R. Ji, L.-Y. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, and W. Gao, “Location discriminative vocabulary coding for mobile landmark search,” Int. J. Comput. Vis., vol. 96, no. 3, pp. 290–314, 2012. doi: 10.1007/s11263-011-0472-9
    [16]
    G. Tolias, Y. Avrithis, and H. Jégou, “Image search with selective match kernels: Aggregation across single and multiple images,” Int. J. Comput. Vis., vol. 116, no. 3, pp. 247–261, 2016. doi: 10.1007/s11263-015-0810-4
    [17]
    A. M. Andrew, “Multiple view geometry in computer vision,” Kybernetes, 2001.
    [18]
    M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. doi: 10.1145/358669.358692
    [19]
    J. Ma, J. Wu, J. Zhao, J. Jiang, H. Zhou, and Q. Z. Sheng, “Nonrigid point set registration with robust transformation learning under manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 12, pp. 3584–3597, 2019. doi: 10.1109/TNNLS.2018.2872528
    [20]
    P. Weinzaepfel, T. Lucas, D. Larlus, and Y. Kalantidis, “Learning super-features for image retrieval,” in Proc. Int. Conf. Learn Representations, 2022.
    [21]
    J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021. doi: 10.1007/s11263-020-01359-2
    [22]
    M. Cummins and P. Newman, “Appearance-only SLAM at large scale with fab-map 2.0,” Int. J. Rob. Res., vol. 30, no. 9, pp. 1100–1123, 2011. doi: 10.1177/0278364910385483
    [23]
    M. Cummins and P. Newman, “Fab-map: Probabilistic localization and mapping in the space of appearance,” Int. J. Rob. Res., vol. 27, no. 6, pp. 647–665, 2008. doi: 10.1177/0278364908090961
    [24]
    R. Mur-Artal and J. D. Tardós, “Fast relocalisation and loop closing in keyframe-based SLAM,” in Proc. IEEE Int. Conf. Robot. Automat., 2014, pp. 846–853.
    [25]
    E. S. Stumm, C. Mei, and S. Lacroix, “Building location models for visual place recognition,” Int. J. Rob. Res., vol. 35, no. 4, pp. 334–356, 2016. doi: 10.1177/0278364915570140
    [26]
    A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer, “A fast and incremental method for loop-closure detection using bags of visual words,” IEEE Trans. Robot., vol. 24, no. 5, pp. 1027–1037, 2008. doi: 10.1109/TRO.2008.2004514
    [27]
    K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Modest-vocabulary loop-closure detection with incremental bag of tracked words,” Robot. Auto. Syst., vol. 141, p. 103782, 2021.
    [28]
    E. Garcia-Fidalgo and A. Ortiz, “iBoW-LCD: An appearance-based loop-closure detection approach using incremental bags of binary words,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3051–3057, 2018. doi: 10.1109/LRA.2018.2849609
    [29]
    K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Probabilistic appearance-based place recognition through bag of tracked words,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1737–1744, 2019. doi: 10.1109/LRA.2019.2897151
    [30]
    K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Assigning visual words to places for loop closure detection,” in Proc. IEEE Int. Conf. Robot Automat., 2018, pp. 1–7.
    [31]
    L. Bampis, A. Amanatiadis, and A. Gasteratos, “Fast loop-closure detection using visual-word-vectors from image sequences,” Int. J. Rob. Res., vol. 37, no. 1, pp. 62–82, 2018. doi: 10.1177/0278364917740639
    [32]
    Y. Xia, J. Li, L. Qi, and H. Fan, “Loop closure detection for visual SLAM using pcanet features,” in Proc. Int. Joint Conf. Neural Netw., 2016, pp. 2274–2281.
    [33]
    J. Ma, S. Wang, K. Zhang, Z. He, J. Huang, and X. Mei, “Fast and robust loop-closure detection via convolutional auto-encoder and motion consensus,” IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 3681–3691, 2022. doi: 10.1109/TII.2021.3120141
    [34]
    J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686
    [35]
    O. Chum, J. Matas, and J. Kittler, “Locally optimized RANSAC,” in Proc. Joint Pattern Recognit. Symp., 2003, pp. 236–243.
    [36]
    R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M. Frahm, “USAC: A universal framework for random sample consensus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 2022–2038, 2012.
    [37]
    D. Barath, J. Matas, and J. Noskova, “MAGSAC: Marginalizing sample consensus,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10197–10205.
    [38]
    D. Barath, J. Noskova, M. Ivashechkin, and J. Matas, “MAGSAC++, a fast, reliable and accurate robust estimator,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 1304–1312.
    [39]
    X. Li and Z. Hu, “Rejecting mismatches by correspondence function,” Int. J. Comput. Vis., vol. 89, no. 1, pp. 1–17, 2010. doi: 10.1007/s11263-010-0318-x
    [40]
    J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, “Robust point matching via vector field consensus,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1706–1721, 2014. doi: 10.1109/TIP.2014.2307478
    [41]
    H. Chen, X. Zhang, S. Du, Z. Wu, and N. Zheng, “A correntropy-based affine iterative closest point algorithm for robust point set registration,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 981–991, 2019. doi: 10.1109/JAS.2019.1911579
    [42]
    C. Leng, H. Zhang, G. Cai, Z. Chen, and A. Basu, “Total variation constrained non-negative matrix factorization for medical image registration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 5, pp. 1025–1037, 2021. doi: 10.1109/JAS.2021.1003979
    [43]
    X. Jiang, Y. Xia, X.-P. Zhang, and J. Ma, “Robust image matching via local graph structure consensus,” Pattern Recognit., p. 108588, 2022.
    [44]
    J. Ma, A. Fan, X. Jiang, and G. Xiao, “Feature matching via motion-consistency driven probabilistic graphical model,” Int. J. Comput. Vis., 2022.
    [45]
    J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving matching,” Int. J. Comput. Vis., vol. 127, no. 5, pp. 512–531, 2019. doi: 10.1007/s11263-018-1117-z
    [46]
    J. Bian, W.-Y. Lin, Y. Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence,” Int. J. Comput. Vis., vol. 128, no. 6, pp. 1580–1593, 2020. doi: 10.1007/s11263-019-01280-3
    [47]
    X. Jiang, J. Ma, J. Jiang, and X. Guo, “Robust feature matching using spatial clustering with heavy outliers,” IEEE Trans. Image Process., vol. 29, pp. 736–746, 2020. doi: 10.1109/TIP.2019.2934572
    [48]
    J. Ma, Z. Li, K. Zhang, Z. Shao, and G. Xiao, “Robust feature matching via neighborhood manifold representation consensus,” ISPRS J. Photogramm. Remote Sens., vol. 183, pp. 196–209, 2022. doi: 10.1016/j.isprsjprs.2021.11.004
    [49]
    K. Zhang, J. Ma, and J. Jiang, “Loop closure detection with reweighting netvlad and local motion and structure consensus,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 6, pp. 1087–1090, 2022. doi: 10.1109/JAS.2022.105635
    [50]
    K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, “Learning to find good correspondences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2666–2674.
    [51]
    J. Ma, X. Jiang, J. Jiang, J. Zhao, and X. Guo, “LMR: Learning a two-class classifier for mismatch removal,” IEEE Trans. Image Process., vol. 28, no. 8, pp. 4045–4059, 2019. doi: 10.1109/TIP.2019.2906490
    [52]
    P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.
    [53]
    G. Tolias, T. Jenicek, and O. Chum, “Learning and aggregating deep local descriptors for instance-level recognition,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 460–477.
    [54]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst., 2017, pp. 1–11.
    [55]
    D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94
    [56]
    Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, 1995. doi: 10.1109/34.400568
    [57]
    A. Fan, J. Ma, X. Jiang, and H. Ling, “Efficient deterministic search with robust loss functions for geometric model fitting,” IEEE Trans. Pattern Anal. Mach. Intell., 2021.
    [58]
    A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1231–1237, 2013. doi: 10.1177/0278364913491297
    [59]
    A. J. Glover, W. P. Maddern, M. J. Milford, and G. F. Wyeth, “FAB-MAP + RatSLAM: Appearance-based slam for multiple times of day,” in Proc. IEEE Int. Conf. Robot. Automat., 2010, pp. 3507–3512.
    [60]
    J.-L. Blanco, F.-A. Moreno, and J. Gonzalez, “A collection of outdoor robotic datasets with centimeter-accuracy ground truth,” Auton. Robots, vol. 27, no. 4, p. 327, 2009.
    [61]
    H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features, ” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.
    [62]
    K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vis., vol. 65, no. 1–2, pp. 43–72, 2005. doi: 10.1007/s11263-005-3848-x
    [63]
    F. Radenović, G. Tolias, and O. Chum, “Fine-tuning CNN image retrieval with no human annotation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 7, pp. 1655–1668, 2018.
    [64]
    A. B. Laguna and K. Mikolajczyk, “Key.Net: Keypoint detection by handcrafted and learned CNN filters revisited,” IEEE Trans. Pattern Anal. Mach. Intell., 2022. DOI: 10.1109/TPAMI.2022.3145820
    [65]
    Y. Tian, A. Barroso Laguna, T. Ng, V. Balntas, and K. Mikolajczyk, “HyNet: Learning local descriptor with hybrid similarity measure and triplet loss,” in Adv. Neural Inf. Process. Syst., 2020, pp. 7401–7412.
    [66]
    S. A. M. Kazmi and B. Mertsching, “Detecting the expectancy of a place using nearby context for appearance-based mapping,” IEEE Trans. Robot., vol. 35, no. 6, pp. 1352–1366, 2019. doi: 10.1109/TRO.2019.2926475

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(6)

    Article Metrics

    Article views (777) PDF downloads(151) Cited by()

    Highlights

    • We design an incremental strategy to make the soundness FIRe applied in LCD
    • We propose LPM-GC to realize feature matching within a few milliseconds in LCD
    • A robust, computationally and memory-wise efficient LCD system is designed
    • Impressive LCD results are shown, especially in long-term routes

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return