IEEE/CAA Journal of Automatica Sinica
Citation:  J. Y. Ma, K. N. Zhang, and J. J. Jiang, “Loop closure detection via locality preserving matching with global consensus,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 411–426, Feb. 2023. doi: 10.1109/JAS.2022.105926 
[1] 
C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robustperception age,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016. doi: 10.1109/TRO.2016.2624754

[2] 
W. Huang, G. Zhang, and X. Han, “Dense mapping from an accurate tracking SLAM,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1565–1574, 2020. doi: 10.1109/JAS.2020.1003357

[3] 
S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Trans. Robot., vol. 32, no. 1, pp. 1–19, 2015.

[4] 
D. GálvezLópez and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012. doi: 10.1109/TRO.2012.2197158

[5] 
L. Bampis, A. Amanatiadis, and A. Gasteratos, “Encoding the description of image sequences: A twolayered pipeline for loop closure detection,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2016, pp. 4530–4536.

[6] 
K. Zhang, X. Jiang, and J. Ma, “Appearancebased loop closure detection via localitydriven accurate motion field learning,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2350–2365, 2022. doi: 10.1109/TITS.2021.3086822

[7] 
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 5297–5307.

[8] 
Y. Xu, J. Huang, J. Wang, Y. Wang, H. Qin, and K. Nan, “ESAVLAD: A lightweight network based on secondorder attention and netvlad for loop closure detection,” IEEE Robot. Autom. Lett., vol. 6, no. 4, pp. 6545–6552, 2021. doi: 10.1109/LRA.2021.3094228

[9] 
S. An, G. Che, F. Zhou, X. Liu, X. Ma, and Y. Chen, “Fast and incremental loop closure detection using proximity graphs,” arXiv preprint arXiv: 1911.10752, 2019.

[10] 
S. An, H. Zhu, D. Wei, K. A. Tsintotas, and A. Gasteratos, “Fast and incremental loop closure detection with deep features and proximity graphs,” J. Field Robot., vol. 39, no. 4, pp. 473–493, 2022. doi: 10.1002/rob.22060

[11] 
J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Comput. Vis., 2003, pp. 1–8.

[12] 
J. MacQueen, et al., “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Statist. Probab., 1967, pp. 281–297.

[13] 
F. Perronnin and C. Dance, “Fisher kernels on visual vocabularies for image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.

[14] 
H. Jégou, M. Douze, and C. Schmid, “Improving bagoffeatures for large scale image search,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 316–336, 2010. doi: 10.1007/s1126300902852

[15] 
R. Ji, L.Y. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, and W. Gao, “Location discriminative vocabulary coding for mobile landmark search,” Int. J. Comput. Vis., vol. 96, no. 3, pp. 290–314, 2012. doi: 10.1007/s1126301104729

[16] 
G. Tolias, Y. Avrithis, and H. Jégou, “Image search with selective match kernels: Aggregation across single and multiple images,” Int. J. Comput. Vis., vol. 116, no. 3, pp. 247–261, 2016. doi: 10.1007/s1126301508104

[17] 
A. M. Andrew, “Multiple view geometry in computer vision,” Kybernetes, 2001.

[18] 
M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. doi: 10.1145/358669.358692

[19] 
J. Ma, J. Wu, J. Zhao, J. Jiang, H. Zhou, and Q. Z. Sheng, “Nonrigid point set registration with robust transformation learning under manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 12, pp. 3584–3597, 2019. doi: 10.1109/TNNLS.2018.2872528

[20] 
P. Weinzaepfel, T. Lucas, D. Larlus, and Y. Kalantidis, “Learning superfeatures for image retrieval,” in Proc. Int. Conf. Learn Representations, 2022.

[21] 
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021. doi: 10.1007/s11263020013592

[22] 
M. Cummins and P. Newman, “Appearanceonly SLAM at large scale with fabmap 2.0,” Int. J. Rob. Res., vol. 30, no. 9, pp. 1100–1123, 2011. doi: 10.1177/0278364910385483

[23] 
M. Cummins and P. Newman, “Fabmap: Probabilistic localization and mapping in the space of appearance,” Int. J. Rob. Res., vol. 27, no. 6, pp. 647–665, 2008. doi: 10.1177/0278364908090961

[24] 
R. MurArtal and J. D. Tardós, “Fast relocalisation and loop closing in keyframebased SLAM,” in Proc. IEEE Int. Conf. Robot. Automat., 2014, pp. 846–853.

[25] 
E. S. Stumm, C. Mei, and S. Lacroix, “Building location models for visual place recognition,” Int. J. Rob. Res., vol. 35, no. 4, pp. 334–356, 2016. doi: 10.1177/0278364915570140

[26] 
A. Angeli, D. Filliat, S. Doncieux, and J.A. Meyer, “A fast and incremental method for loopclosure detection using bags of visual words,” IEEE Trans. Robot., vol. 24, no. 5, pp. 1027–1037, 2008. doi: 10.1109/TRO.2008.2004514

[27] 
K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Modestvocabulary loopclosure detection with incremental bag of tracked words,” Robot. Auto. Syst., vol. 141, p. 103782, 2021.

[28] 
E. GarciaFidalgo and A. Ortiz, “iBoWLCD: An appearancebased loopclosure detection approach using incremental bags of binary words,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3051–3057, 2018. doi: 10.1109/LRA.2018.2849609

[29] 
K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Probabilistic appearancebased place recognition through bag of tracked words,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1737–1744, 2019. doi: 10.1109/LRA.2019.2897151

[30] 
K. A. Tsintotas, L. Bampis, and A. Gasteratos, “Assigning visual words to places for loop closure detection,” in Proc. IEEE Int. Conf. Robot Automat., 2018, pp. 1–7.

[31] 
L. Bampis, A. Amanatiadis, and A. Gasteratos, “Fast loopclosure detection using visualwordvectors from image sequences,” Int. J. Rob. Res., vol. 37, no. 1, pp. 62–82, 2018. doi: 10.1177/0278364917740639

[32] 
Y. Xia, J. Li, L. Qi, and H. Fan, “Loop closure detection for visual SLAM using pcanet features,” in Proc. Int. Joint Conf. Neural Netw., 2016, pp. 2274–2281.

[33] 
J. Ma, S. Wang, K. Zhang, Z. He, J. Huang, and X. Mei, “Fast and robust loopclosure detection via convolutional autoencoder and motion consensus,” IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 3681–3691, 2022. doi: 10.1109/TII.2021.3120141

[34] 
J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “SwinFusion: Crossdomain longrange learning for general image fusion via swin transformer,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686

[35] 
O. Chum, J. Matas, and J. Kittler, “Locally optimized RANSAC,” in Proc. Joint Pattern Recognit. Symp., 2003, pp. 236–243.

[36] 
R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.M. Frahm, “USAC: A universal framework for random sample consensus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 2022–2038, 2012.

[37] 
D. Barath, J. Matas, and J. Noskova, “MAGSAC: Marginalizing sample consensus,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10197–10205.

[38] 
D. Barath, J. Noskova, M. Ivashechkin, and J. Matas, “MAGSAC++, a fast, reliable and accurate robust estimator,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 1304–1312.

[39] 
X. Li and Z. Hu, “Rejecting mismatches by correspondence function,” Int. J. Comput. Vis., vol. 89, no. 1, pp. 1–17, 2010. doi: 10.1007/s112630100318x

[40] 
J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, “Robust point matching via vector field consensus,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1706–1721, 2014. doi: 10.1109/TIP.2014.2307478

[41] 
H. Chen, X. Zhang, S. Du, Z. Wu, and N. Zheng, “A correntropybased affine iterative closest point algorithm for robust point set registration,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 4, pp. 981–991, 2019. doi: 10.1109/JAS.2019.1911579

[42] 
C. Leng, H. Zhang, G. Cai, Z. Chen, and A. Basu, “Total variation constrained nonnegative matrix factorization for medical image registration,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 5, pp. 1025–1037, 2021. doi: 10.1109/JAS.2021.1003979

[43] 
X. Jiang, Y. Xia, X.P. Zhang, and J. Ma, “Robust image matching via local graph structure consensus,” Pattern Recognit., p. 108588, 2022.

[44] 
J. Ma, A. Fan, X. Jiang, and G. Xiao, “Feature matching via motionconsistency driven probabilistic graphical model,” Int. J. Comput. Vis., 2022.

[45] 
J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving matching,” Int. J. Comput. Vis., vol. 127, no. 5, pp. 512–531, 2019. doi: 10.1007/s112630181117z

[46] 
J. Bian, W.Y. Lin, Y. Matsushita, S.K. Yeung, T.D. Nguyen, and M.M. Cheng, “GMS: Gridbased motion statistics for fast, ultrarobust feature correspondence,” Int. J. Comput. Vis., vol. 128, no. 6, pp. 1580–1593, 2020. doi: 10.1007/s11263019012803

[47] 
X. Jiang, J. Ma, J. Jiang, and X. Guo, “Robust feature matching using spatial clustering with heavy outliers,” IEEE Trans. Image Process., vol. 29, pp. 736–746, 2020. doi: 10.1109/TIP.2019.2934572

[48] 
J. Ma, Z. Li, K. Zhang, Z. Shao, and G. Xiao, “Robust feature matching via neighborhood manifold representation consensus,” ISPRS J. Photogramm. Remote Sens., vol. 183, pp. 196–209, 2022. doi: 10.1016/j.isprsjprs.2021.11.004

[49] 
K. Zhang, J. Ma, and J. Jiang, “Loop closure detection with reweighting netvlad and local motion and structure consensus,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 6, pp. 1087–1090, 2022. doi: 10.1109/JAS.2022.105635

[50] 
K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, “Learning to find good correspondences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2666–2674.

[51] 
J. Ma, X. Jiang, J. Jiang, J. Zhao, and X. Guo, “LMR: Learning a twoclass classifier for mismatch removal,” IEEE Trans. Image Process., vol. 28, no. 8, pp. 4045–4059, 2019. doi: 10.1109/TIP.2019.2906490

[52] 
P.E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.

[53] 
G. Tolias, T. Jenicek, and O. Chum, “Learning and aggregating deep local descriptors for instancelevel recognition,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 460–477.

[54] 
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst., 2017, pp. 1–11.

[55] 
D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94

[56] 
Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, 1995. doi: 10.1109/34.400568

[57] 
A. Fan, J. Ma, X. Jiang, and H. Ling, “Efficient deterministic search with robust loss functions for geometric model fitting,” IEEE Trans. Pattern Anal. Mach. Intell., 2021.

[58] 
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1231–1237, 2013. doi: 10.1177/0278364913491297

[59] 
A. J. Glover, W. P. Maddern, M. J. Milford, and G. F. Wyeth, “FABMAP + RatSLAM: Appearancebased slam for multiple times of day,” in Proc. IEEE Int. Conf. Robot. Automat., 2010, pp. 3507–3512.

[60] 
J.L. Blanco, F.A. Moreno, and J. Gonzalez, “A collection of outdoor robotic datasets with centimeteraccuracy ground truth,” Auton. Robots, vol. 27, no. 4, p. 327, 2009.

[61] 
H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features, ” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.

[62] 
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vis., vol. 65, no. 1–2, pp. 43–72, 2005. doi: 10.1007/s112630053848x

[63] 
F. Radenović, G. Tolias, and O. Chum, “Finetuning CNN image retrieval with no human annotation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 7, pp. 1655–1668, 2018.

[64] 
A. B. Laguna and K. Mikolajczyk, “Key.Net: Keypoint detection by handcrafted and learned CNN filters revisited,” IEEE Trans. Pattern Anal. Mach. Intell., 2022. DOI: 10.1109/TPAMI.2022.3145820

[65] 
Y. Tian, A. Barroso Laguna, T. Ng, V. Balntas, and K. Mikolajczyk, “HyNet: Learning local descriptor with hybrid similarity measure and triplet loss,” in Adv. Neural Inf. Process. Syst., 2020, pp. 7401–7412.

[66] 
S. A. M. Kazmi and B. Mertsching, “Detecting the expectancy of a place using nearby context for appearancebased mapping,” IEEE Trans. Robot., vol. 35, no. 6, pp. 1352–1366, 2019. doi: 10.1109/TRO.2019.2926475
