Loop Closure Detection via Locality Preserving Matching With Global Consensus

Jiayi Ma; Kaining Zhang; Junjun Jiang

doi:10.1109/JAS.2022.105926

Volume 10 Issue 2

Feb. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(2): 411-426

J. Y. Ma, K. N. Zhang, and J. J. Jiang, “Loop closure detection via locality preserving matching with global consensus,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 411–426, Feb. 2023. doi: 10.1109/JAS.2022.105926

Citation:

J. Y. Ma, K. N. Zhang, and J. J. Jiang, “Loop closure detection via locality preserving matching with global consensus,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 411–426, Feb. 2023. doi: 10.1109/JAS.2022.105926

Citation:

PDF( 3565 KB)

Loop Closure Detection via Locality Preserving Matching With Global Consensus

doi: 10.1109/JAS.2022.105926

Jiayi Ma^{1
,
,},
Kaining Zhang^1
,,
Junjun Jiang^2
,

1.
Electronic Information School, Wuhan University, Wuhan 430072, China
2.
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

Funds: This work was supported by the Key Research and Development Program of Hubei Province (2020BAB113)

More Information

Author Bio:
Jiayi Ma (Senior Member, IEEE) received the B.S. degree in information and computing science and the Ph.D. degree in control science and engineering from the Huazhong University of Science and Technology, in 2008 and 2014, respectively. He is currently a Professor with the Electronic Information School, Wuhan University. He has authored or co-authored more than 200 refereed journal and conference papers, including IEEE TPAMI/TIP, IJCV, CVPR, ICCV, ECCV, etc. His research interests include computer vision, machine learning, and pattern recognition. Dr. Ma has been identified in the 2019–2021 Highly Cited Researcher lists from the Web of Science Group. He is an Area Editor of Information Fusion, and an Editorial Board Member of Neurocomputing

Kaining Zhang received the B.S. degree from the Electronic Information School, Wuhan University, in 2019. She is currently pursuing the Ph.D. degree at the Multi-Spectral Vision Processing Lab of Wuhan University. Her current research interests include robotics, machine learning and image processing

Junjun Jiang (Senior Member, IEEE) received the B.S. degree from the Department of Mathematics, Huaqiao University, in 2009, and the Ph.D. degree from the School of Computer, Wuhan University, in 2014. He is currently a Professor with the School of Computer Science and Technology, Harbin Institute of Technology. He won the Finalist of the World’s FIRST 10K Best Paper Award at ICME 2017, and the Best Student Paper Runner-up Award at MMM 2015. He received the 2016 China Computer Federation (CCF) Outstanding Doctoral Dissertation Award and 2015 ACM Wuhan Doctoral Dissertation Award. His research interests include image processing and computer vision
Corresponding author: Jiayi Ma, e-mail: jyma2010@gmail.com
¹ In the introduction of “Super-features”, the term “query” refers to the input of the Transformer architecture, which has different meaning with that in image retrieval and LCD.
² https://github.com/jenicek/asmk
³ https://github.com/axelBarroso/Key.Net-Pytorch
Received Date: 2022-06-03
Accepted Date: 2022-07-18

Available Online: 2022-08-16

Abstract

Abstract

A critical component of visual simultaneous localization and mapping is loop closure detection (LCD), an operation judging whether a robot has come to a pre-visited area. Concretely, given a query image (i.e., the latest view observed by the robot), it proceeds by first exploring images with similar semantic information, followed by solving the relative relationship between candidate pairs in the 3D space. In this work, a novel appearance-based LCD system is proposed. Specifically, candidate frame selection is conducted via the combination of Super-features and aggregated selective match kernel (ASMK). We incorporate an incremental strategy into the vanilla ASMK to make it applied in the LCD task. It is demonstrated that this setting is memory-wise efficient and can achieve remarkable performance. To dig up consistent geometry between image pairs during loop closure verification, we propose a simple yet surprisingly effective feature matching algorithm, termed locality preserving matching with global consensus (LPM-GC). The major objective of LPM-GC is to retain the local neighborhood information of true feature correspondences between candidate pairs, where a global constraint is further designed to effectively remove false correspondences in challenging sceneries, e.g., containing numerous repetitive structures. Meanwhile, we derive a closed-form solution that enables our approach to provide reliable correspondences within only a few milliseconds. The performance of the proposed approach has been experimentally evaluated on ten publicly available and challenging datasets. Results show that our method can achieve better performance over the state-of-the-art in both feature matching and LCD tasks. We have released our code of LPM-GC at https://github.com/jiayi-ma/LPM-GC.
- Feature matching,
- locality preserving matching,
- loop closure detection,
- SLAM

FullText(HTML)

¹ In the introduction of “Super-features”, the term “query” refers to the input of the Transformer architecture, which has different meaning with that in image retrieval and LCD.
² https://github.com/jenicek/asmk
³ https://github.com/axelBarroso/Key.Net-Pytorch

References(66)

References

[1]	C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016. doi: 10.1109/TRO.2016.2624754
[2]	W. Huang, G. Zhang, and X. Han, “Dense mapping from an accurate tracking SLAM,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1565–1574, 2020. doi: 10.1109/JAS.2020.1003357
[3]	S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Trans. Robot., vol. 32, no. 1, pp. 1–19, 2015.
[4]	D. Gálvez-López and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012. doi: 10.1109/TRO.2012.2197158
[5]	L. Bampis, A. Amanatiadis, and A. Gasteratos, “Encoding the description of image sequences: A two-layered pipeline for loop closure detection,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2016, pp. 4530–4536.
[6]	K. Zhang, X. Jiang, and J. Ma, “Appearance-based loop closure detection via locality-driven accurate motion field learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2350–2365, 2022. doi: 10.1109/TITS.2021.3086822
[7]	R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 5297–5307.
[8]	Y. Xu, J. Huang, J. Wang, Y. Wang, H. Qin, and K. Nan, “ESA-VLAD: A lightweight network based on second-order attention and netvlad for loop closure detection,” IEEE Robot. Autom. Lett., vol. 6, no. 4, pp. 6545–6552, 2021. doi: 10.1109/LRA.2021.3094228
[9]	S. An, G. Che, F. Zhou, X. Liu, X. Ma, and Y. Chen, “Fast and incremental loop closure detection using proximity graphs,” arXiv preprint arXiv: 1911.10752, 2019.
[10]	S. An, H. Zhu, D. Wei, K. A. Tsintotas, and A. Gasteratos, “Fast and incremental loop closure detection with deep features and proximity graphs,” J. Field Robot., vol. 39, no. 4, pp. 473–493, 2022. doi: 10.1002/rob.22060
[11]	J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Comput. Vis., 2003, pp. 1–8.
[12]	J. MacQueen, et al., “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Statist. Probab., 1967, pp. 281–297.
[13]	F. Perronnin and C. Dance, “Fisher kernels on visual vocabularies for image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.
[14]	H. Jégou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 316–336, 2010. doi: 10.1007/s11263-009-0285-2
[15]	R. Ji, L.-Y. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, and W. Gao, “Location discriminative vocabulary coding for mobile landmark search,” Int. J. Comput. Vis., vol. 96, no. 3, pp. 290–314, 2012. doi: 10.1007/s11263-011-0472-9
[16]	G. Tolias, Y. Avrithis, and H. Jégou, “Image search with selective match kernels: Aggregation across single and multiple images,” Int. J. Comput. Vis., vol. 116, no. 3, pp. 247–261, 2016. doi: 10.1007/s11263-015-0810-4
[17]	A. M. Andrew, “Multiple view geometry in computer vision,” Kybernetes, 2001.
[18]	M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. doi: 10.1145/358669.358692
[19]	J. Ma, J. Wu, J. Zhao, J. Jiang, H. Zhou, and Q. Z. Sheng, “Nonrigid point set registration with robust transformation learning under manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 12, pp. 3584–3597, 2019. doi: 10.1109/TNNLS.2018.2872528
[20]	P. Weinzaepfel, T. Lucas, D. Larlus, and Y. Kalantidis, “Learning super-features for image retrieval,” in Proc. Int. Conf. Learn Representations, 2022.
[21]	J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021. doi: 10.1007/s11263-020-01359-2
[22]	M. Cummins and P. Newman, “Appearance-only SLAM at large scale with fab-map 2.0,” Int. J. Rob. Res., vol. 30, no. 9, pp. 1100–1123, 2011. doi: 10.1177/0278364910385483
[23]	M. Cummins and P. Newman, “Fab-map: Probabilistic localization and mapping in the space of appearance,” Int. J. Rob. Res., vol. 27, no. 6, pp. 647–665, 2008. doi: 10.1177/0278364908090961
[24]	R. Mur-Artal and J. D. Tardós, “Fast relocalisation and loop closing in keyframe-based SLAM,” in Proc. IEEE Int. Conf. Robot. Automat., 2014, pp. 846–853.
[25]	E. S. Stumm, C. Mei, and S. Lacroix, “Building location models for visual place recognition,” Int. J. Rob. Res., vol. 35, no. 4, pp. 334–356, 2016. doi: 10.1177/0278364915570140
[26]	A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer, “A fast and incremental method for loop-closure detection using bags of visual words,” IEEE Trans. Robot., vol. 24, no. 5, pp. 1027–1037, 2008. doi: 10.1109/TRO.2008.2004514
[27]	K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Modest-vocabulary loop-closure detection with incremental bag of tracked words,” Robot. Auto. Syst., vol. 141, p. 103782, 2021.
[28]	E. Garcia-Fidalgo and A. Ortiz, “iBoW-LCD: An appearance-based loop-closure detection approach using incremental bags of binary words,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3051–3057, 2018. doi: 10.1109/LRA.2018.2849609
[29]	K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Probabilistic appearance-based place recognition through bag of tracked words,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1737–1744, 2019. doi: 10.1109/LRA.2019.2897151
[30]	K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Assigning visual words to places for loop closure detection,” in Proc. IEEE Int. Conf. Robot Automat., 2018, pp. 1–7.
[31]	L. Bampis, A. Amanatiadis, and A. Gasteratos, “Fast loop-closure detection using visual-word-vectors from image sequences,” Int. J. Rob. Res., vol. 37, no. 1, pp. 62–82, 2018. doi: 10.1177/0278364917740639
[32]	Y. Xia, J. Li, L. Qi, and H. Fan, “Loop closure detection for visual SLAM using pcanet features,” in Proc. Int. Joint Conf. Neural Netw., 2016, pp. 2274–2281.
[33]	J. Ma, S. Wang, K. Zhang, Z. He, J. Huang, and X. Mei, “Fast and robust loop-closure detection via convolutional auto-encoder and motion consensus,” IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 3681–3691, 2022. doi: 10.1109/TII.2021.3120141
[34]	J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686
[35]	O. Chum, J. Matas, and J. Kittler, “Locally optimized RANSAC,” in Proc. Joint Pattern Recognit. Symp., 2003, pp. 236–243.
[36]	R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M. Frahm, “USAC: A universal framework for random sample consensus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 2022–2038, 2012.
[37]	D. Barath, J. Matas, and J. Noskova, “MAGSAC: Marginalizing sample consensus,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10197–10205.
[38]	D. Barath, J. Noskova, M. Ivashechkin, and J. Matas, “MAGSAC++, a fast, reliable and accurate robust estimator,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 1304–1312.
[39]	X. Li and Z. Hu, “Rejecting mismatches by correspondence function,” Int. J. Comput. Vis., vol. 89, no. 1, pp. 1–17, 2010. doi: 10.1007/s11263-010-0318-x
[40]	J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, “Robust point matching via vector field consensus,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1706–1721, 2014. doi: 10.1109/TIP.2014.2307478
[41]	H. Chen, X. Zhang, S. Du, Z. Wu, and N. Zheng, “A correntropy-based affine iterative closest point algorithm for robust point set registration,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 981–991, 2019. doi: 10.1109/JAS.2019.1911579
[42]	C. Leng, H. Zhang, G. Cai, Z. Chen, and A. Basu, “Total variation constrained non-negative matrix factorization for medical image registration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 5, pp. 1025–1037, 2021. doi: 10.1109/JAS.2021.1003979
[43]	X. Jiang, Y. Xia, X.-P. Zhang, and J. Ma, “Robust image matching via local graph structure consensus,” Pattern Recognit., p. 108588, 2022.
[44]	J. Ma, A. Fan, X. Jiang, and G. Xiao, “Feature matching via motion-consistency driven probabilistic graphical model,” Int. J. Comput. Vis., 2022.
[45]	J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving matching,” Int. J. Comput. Vis., vol. 127, no. 5, pp. 512–531, 2019. doi: 10.1007/s11263-018-1117-z
[46]	J. Bian, W.-Y. Lin, Y. Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence,” Int. J. Comput. Vis., vol. 128, no. 6, pp. 1580–1593, 2020. doi: 10.1007/s11263-019-01280-3
[47]	X. Jiang, J. Ma, J. Jiang, and X. Guo, “Robust feature matching using spatial clustering with heavy outliers,” IEEE Trans. Image Process., vol. 29, pp. 736–746, 2020. doi: 10.1109/TIP.2019.2934572
[48]	J. Ma, Z. Li, K. Zhang, Z. Shao, and G. Xiao, “Robust feature matching via neighborhood manifold representation consensus,” ISPRS J. Photogramm. Remote Sens., vol. 183, pp. 196–209, 2022. doi: 10.1016/j.isprsjprs.2021.11.004
[49]	K. Zhang, J. Ma, and J. Jiang, “Loop closure detection with reweighting netvlad and local motion and structure consensus,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 6, pp. 1087–1090, 2022. doi: 10.1109/JAS.2022.105635
[50]	K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, “Learning to find good correspondences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2666–2674.
[51]	J. Ma, X. Jiang, J. Jiang, J. Zhao, and X. Guo, “LMR: Learning a two-class classifier for mismatch removal,” IEEE Trans. Image Process., vol. 28, no. 8, pp. 4045–4059, 2019. doi: 10.1109/TIP.2019.2906490
[52]	P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.
[53]	G. Tolias, T. Jenicek, and O. Chum, “Learning and aggregating deep local descriptors for instance-level recognition,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 460–477.
[54]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst., 2017, pp. 1–11.
[55]	D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94
[56]	Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, 1995. doi: 10.1109/34.400568
[57]	A. Fan, J. Ma, X. Jiang, and H. Ling, “Efficient deterministic search with robust loss functions for geometric model fitting,” IEEE Trans. Pattern Anal. Mach. Intell., 2021.
[58]	A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1231–1237, 2013. doi: 10.1177/0278364913491297
[59]	A. J. Glover, W. P. Maddern, M. J. Milford, and G. F. Wyeth, “FAB-MAP + RatSLAM: Appearance-based slam for multiple times of day,” in Proc. IEEE Int. Conf. Robot. Automat., 2010, pp. 3507–3512.
[60]	J.-L. Blanco, F.-A. Moreno, and J. Gonzalez, “A collection of outdoor robotic datasets with centimeter-accuracy ground truth,” Auton. Robots, vol. 27, no. 4, p. 327, 2009.
[61]	H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features, ” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.
[62]	K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vis., vol. 65, no. 1–2, pp. 43–72, 2005. doi: 10.1007/s11263-005-3848-x
[63]	F. Radenović, G. Tolias, and O. Chum, “Fine-tuning CNN image retrieval with no human annotation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 7, pp. 1655–1668, 2018.
[64]	A. B. Laguna and K. Mikolajczyk, “Key.Net: Keypoint detection by handcrafted and learned CNN filters revisited,” IEEE Trans. Pattern Anal. Mach. Intell., 2022. DOI: 10.1109/TPAMI.2022.3145820
[65]	Y. Tian, A. Barroso Laguna, T. Ng, V. Balntas, and K. Mikolajczyk, “HyNet: Learning local descriptor with hybrid similarity measure and triplet loss,” in Adv. Neural Inf. Process. Syst., 2020, pp. 7401–7412.
[66]	S. A. M. Kazmi and B. Mertsching, “Detecting the expectancy of a place using nearby context for appearance-based mapping,” IEEE Trans. Robot., vol. 35, no. 6, pp. 1352–1366, 2019. doi: 10.1109/TRO.2019.2926475

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views (996) PDF downloads(172)

Highlights

We design an incremental strategy to make the soundness FIRe applied in LCD
We propose LPM-GC to realize feature matching within a few milliseconds in LCD
A robust, computationally and memory-wise efficient LCD system is designed
Impressive LCD results are shown, especially in long-term routes

Loop Closure Detection via Locality Preserving Matching With Global Consensus

doi: 10.1109/JAS.2022.105926

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content