LTDNet: A Lightweight Text Detector for Real-Time Arbitrary-Shape Traffic Text Detection

Runmin Wang; Yanbin Zhu; Ziyu Zhu; Lingxin Cui; Zukun Wan; Anna Zhu; Yajun Ding; Shengyou Qian; Changxin Gao; Nong Sang

doi:10.1109/JAS.2024.125022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 8, pp. 1–14, Aug. 2025. doi: 10.1109/JAS.2024.125022

Citation:

R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 8, pp. 1–14, Aug. 2025. doi: 10.1109/JAS.2024.125022

Citation:

PDF( 14173 KB)

LTDNet: A Lightweight Text Detector for Real-Time Arbitrary-Shape Traffic Text Detection

doi: 10.1109/JAS.2024.125022

Funds: This work was supported in part by the National Natural Science Foundation of China (61502164, 62176097), the Natural Science Foundation of Hunan Province (2020JJ4057), the Key Research and Development Program of Changsha Science and Technology Bureau (kq2004050), and the Scientific Research Foundation of the Education Department of Hunan Province of China (21A0052)

More Information

Author Bio:
Runmin Wang (Member, IEEE) received the Ph.D. degree from the Huazhong University of Science and Technology (HUST) in 2015. He completed postdoctoral research at the National University of Defense Technology (NUDT) in 2021. He is currently an Associate Professor with the School of Information Science and Engineering, Hunan Normal University (HNNU). His current research interests include object detection and recognition, and document image analysis and recognition

Yanbin Zhu is currently an M.S. student with the School of Information Science and Engineering, Hunan Normal University (HNNU). His research interests include computer vision and pattern recognition. His recent works involve document image analysis and recognition, and pedestrian re-identification

Ziyu Zhu is currently an M.S. student with the School of Information Science and Engineering, Hunan Normal University (HNNU). Her research interests include computer vision and pattern recognition. Her recent works involve semantic segmentation, and pedestrian re-identification

Lingxin Cui is currently an M.S. student with the School of Information Science and Engineering, Hunan Normal University (HNNU). Her research interests include computer vision and pattern recognition. Her recent works involve semantic segmentation, and medical image processing

Zukun Wan is currently an M.S. student with the School of Information Science and Engineering, Hunan Normal University (HNNU). His research interests include computer vision and pattern recognition. His recent works involve document image analysis and recognition, and Visual Q&A

Anna Zhu received the Ph.D. degree from the Huazhong University of Science and Technology (HUST) in 2015. She is currently an Associate Professor with the School of Computer and Artificial Intelligence, Wuhan University of Technology. Her current research interests include object detection, document image analysis and recognition

Yajun Ding received the Ph.D. degree from the Hunan Normal University (HNNU) in 2013. He is currently an Associate Professor with the School of Information Science and Engineering, Hunan Normal University. His current research interests include object detection and recognition, medical image processing

Shengyou Qian received the Ph.D. degree from the Shanghai Jiao Tong University (SJTU) in 1997. He is currently a Professor with the School of Physical and Electronic Sciences, Hunan Normal University (HNNU). His current research interests include computer vision, medical image processing

Changxin Gao (Senior Member, IEEE) received the Ph.D. degree from the Huazhong University of Science and Technology (HUST) in 2010. He is currently a Professor with the School of Artificial Intelligence and Automation, Huazhong University of Science and Technology. His research interests are pattern recognition and surveillance video analysis

Nong Sang (Member, IEEE) received the B.E. degree in computer science and engineering, the M.S. degree in pattern recognition and intelligent control, and the Ph.D. degree in pattern recognition and intelligent systems from Huazhong University of Science and Technology (HUST) in 1990, 1993, and 2000, respectively. He is currently a Professor with the School of Artificial Intelligence and Automation, Huazhong University of Science and Technology. His research interests include object detection, object tracking, image/video segmentation
Corresponding author: Runmin Wang, e-mail: runminwang@hunnu.edu.cn; Shengyou Qian, e-mail: qiansy@hunnu.edu.cn
Received Date: 2024-09-30
Accepted Date: 2024-10-25

Available Online: 2025-01-22

Abstract

Abstract

Traffic text detection plays a vital role in understanding traffic scenes. Traffic text, a distinct subset of natural scene text, faces specific challenges not found in natural scene text detection, including false alarms from non-traffic text sources, such as roadside advertisements and building signs. Existing state-of-the-art methods employ increasingly complex detection frameworks to pursue higher accuracy, leading to challenges with real-time performance. In response to this issue, we propose a real-time and efficient traffic text detector named LTDNet, which strikes a balance between accuracy and real-time capabilities. LTDNet integrates three essential techniques to address these challenges effectively. First, a cascaded multilevel feature fusion network is employed to mitigate the limitations of lightweight backbone networks, thereby enhancing detection accuracy. Second, a lightweight feature attention module is introduced to enhance inference speed without compromising accuracy. Finally, a novel point-to-edge distance vector loss function is proposed to precisely localize text instance boundaries within traffic contexts. The superiority of our method is validated through extensive experiments on five publicly available datasets, demonstrating its state-of-the-art performance. The code will be released at https://github.com/runminwang/LTDNet.
- Attention mechanism,
- feature fusion,
- point-to-edge distance vector,
- traffic text detection

FullText(HTML)

References(51)

References

[1]	J. Zhang, Y. Lv, J. Tao, F. Huang, and J. Zhang, “A robust real-time anchor-free traffic sign detector with one-level feature,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 8, no. 2, pp. 1437–1451, Apr. 2024. doi: 10.1109/TETCI.2024.3349464
[2]	J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, “CCTSDB 2021: A more comprehensive traffic sign detection benchmark,” Human-Centric Comput. Infor. Sci., vol. 12, p. 23, May 2022.
[3]	J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, and C. S. Lai, “Vehicle-mounted adaptive traffic sign detector for small-sized signs in multiple working conditions,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 1, pp. 710–724, Jan. 2024. doi: 10.1109/TITS.2023.3309644
[4]	R. Wang, J. Hei, M. Liu, Z. Wan, J. Xu, X. Cao, X. He, Y. Ding, C. Gao, and N. Sang, “CR2-Net: Component relationship reasoning network for traffic text detection,” IEEE Trans. Intell. Veh., pp. 1–15, 2024. doi: 10.1109/TIV.2024.3411093
[5]	R. Bagi, T. Dutta, N. Nigam, D. Verma, and H. P. Gupta, “Met-MLTS: Leveraging smartphones for end-to-end spotting of multilingual oriented scene texts and traffic signs in adverse meteorological conditions,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 12801–12810, Aug. 2022. doi: 10.1109/TITS.2021.3117793
[6]	D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, “ICDAR 2015 competition on robust reading,” in Proc. 13th Int. Conf. Document Analysis and Recognition, Tunis, Tunisia, 2015, pp. 1156–1160.
[7]	W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 9328–9337.
[8]	M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Process., vol. 27, no. 8, pp. 3676–3690, Aug. 2018. doi: 10.1109/TIP.2018.2825107
[9]	P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, “Single shot text detector with regional attention,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 3066–3074.
[10]	M. Liao, Z. Zhu, B. Shi, G.-S. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 5909–5918.
[11]	S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “TextSnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 19–35.
[12]	X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: An efficient and accurate scene text detector,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2642–2651.
[13]	Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 11750–11759.
[14]	S. Liu, Y. Xian, H. Li, and Z. Yu, “Text detection in natural scene images using morphological component analysis and Laplacian dictionary,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 214–222, Jan. 2020. doi: 10.1109/JAS.2017.7510427
[15]	W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 8439–8448.
[16]	M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. 34th AAAI Conf. on Artificial Intelligence, New York, USA, 2020, pp. 11474–11481.
[17]	W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5349–5367, Sep. 2022.
[18]	M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-time scene text detection with differentiable binarization and adaptive scale fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 919–931, Jan. 2023. doi: 10.1109/TPAMI.2022.3155612
[19]	N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 886–893.
[20]	D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. 7th IEEE Int. Conf. Computer Vision, Kerkyra, Greece, 1999, pp. 1150–1157.
[21]	J. Zhang, W. Feng, T. Yuan, J. Wang, and A. K. Sangaiah, “SCSTCF: Spatial-channel selection and temporal regularized correlation filters for visual tracking,” Appl. Soft Comput., vol. 118, p. 108485, Mar. 2022. doi: 10.1016/j.asoc.2022.108485
[22]	S. Kan, Y. Cen, Z. He, Z. Zhang, L. Zhang, and Y. Wang, “Supervised deep feature embedding with handcrafted feature,” IEEE Trans. Image Process., vol. 28, no. 12, pp. 5809–5823, Dec. 2019. doi: 10.1109/TIP.2019.2901407
[23]	M. Cao, C. Zhang, D. Yang, and Y. Zou, “All you need is a second look: Towards arbitrary-shaped text detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 758–767, Feb. 2022. doi: 10.1109/TCSVT.2021.3068133
[24]	K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
[25]	Q. Wan, H. Ji, and L. Shen, “Self-attention based text knowledge mining for text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 5979–5988.
[26]	J. Greenhalgh and M. Mirmehdi, “Recognizing text-based traffic signs,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 3, pp. 1360–1369, Jun. 2015. doi: 10.1109/TITS.2014.2363167
[27]	X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Proc. European Computer Vision, Milan, Italy, 2016, pp. 109–121.
[28]	Y. Zhu, M. Liao, M. Yang, and W. Liu, “Cascaded segmentation-detection networks for text-based traffic sign detection,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, pp. 209–219, Jan. 2018. doi: 10.1109/TITS.2017.2768827
[29]	Z. Zuo and P. Yang, “A traffic sign text detection system for pratical natural scenes,” in Proc. IEEE 24th Int. Conf. Parallel and Distributed Systems, Singapore, Singapore, 2018, pp. 1069–1074.
[30]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
[31]	X. He, R. Wang, X. Li, X. Chen, C. Guo, L. Wen, C. Gao, and L. Liu, “HTSTL: Head-and-tail search network with scale-transfer layer for traffic sign text detection,” IEEE Access, vol. 7, pp. 118333–118342, Aug. 2019.
[32]	J.-B. Hou, X. Zhu, C. Liu, C. Yang, L.-H. Wu, H. Wang, and X.-C. Yin, “Detecting text in scene and traffic guide panels with attention anchor mechanism,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 11, pp. 6890–6899, Nov. 2021. doi: 10.1109/TITS.2020.2996027
[33]	M. Liang, X. Zhu, H. Zhou, J. Qin, and X.-C. Yin, “HFENet: Hybrid feature enhancement network for detecting texts in scenes and traffic panels,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 12, pp. 14200–14212, Dec. 2023. doi: 10.1109/TITS.2023.3305686
[34]	S. Khalid, J. H. Shah, M. Sharif, F. Dahan, R. Saleem, and A. Masood, “A robust intelligent system for text-based traffic signs detection and recognition in challenging weather conditions,” IEEE Access, vol. 12, pp. 78261–78274, May 2024.
[35]	X. He, Z. Li, J. Lin, K. Nai, J. Yuan, Y. Li, and R. Wang, “Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes,” Comput Vis. Image Underst., vol. 233, p. 103709, Aug. 2023. doi: 10.1016/j.cviu.2023.103709
[36]	M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. 31th AAAI Conf. Artificial Intelligence, San Francisco, USA, 2017, pp. 4161–4167.
[37]	J. Li, Y. Lin, R. Liu, C. M. Ho, and H. Shi, “RSCA: Real-time segmentation-based context-aware scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Nashville, USA, 2021, pp. 2349–2358.
[38]	Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu, and H. Chen, “ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8048–8064, Nov. 2022.
[39]	X. Qin, P. Lyu, C. Zhang, Y. Zhou, K. Yao, P. Zhang, H. Lin, and W. Wang, “Towards robust real-time scene text detection: From semantic to instance representation learning,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, p. 2025–2034.
[40]	Y. Zhao, Y. Cai, W. Wu, and W. Wang, “Explore faster localization learning for scene text detection,” in Proc. IEEE Int. Conf. Multimedia and Expo, Brisbane, Australia, 2023, pp. 156–161.
[41]	Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” arXiv preprint arXiv: 1508.01991, 2015.
[42]	F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in Proc. 4th Int. Conf. 3D Vision, Stanford, USA, 2016, pp. 565–571.
[43]	A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 761–769.
[44]	Y. Chen and L. Huang, “Chinese traffic panels detection and recognition from street-level images,” MATEC Web Conf., vol. 42, p. 06001, Feb. 2016. doi: 10.1051/matecconf/20164206001
[45]	Y. Liu, L. Jin, S. Zhang, and S. Zhang, “Detecting curve text in the wild: New dataset and new solution,” arXiv preprint arXiv: 1712.02170, 2017.
[46]	A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 2315–2324.
[47]	J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
[48]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2014, pp. 1–15.
[49]	J. Wang, S. Wu, H. Zhang, B. Yuan, C. Dai, and N. R. Pal, “Universal approximation abilities of a modular differentiable neural network,” IEEE Trans. Neural Netw. Learn. Syst., 2024, DOI: 10.1109/TNNLS.2024.3378697.
[50]	J. Zhang, Y. He, W. Chen, L.-D. Kuang, and B. Zheng, “CorrFormer: Context-aware tracking with cross-correlation and transformer,” Comput. Electr. Eng., vol. 114, p. 109075, Mar. 2024. doi: 10.1016/j.compeleceng.2024.109075
[51]	Y. Zhang, T. Zhang, C. Wu, and R. Tao, “Multi-scale spatiotemporal feature fusion network for video saliency prediction,” IEEE Trans. Multimedia, vol. 26, pp. 4183–4193, Oct. 2024.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (189) PDF downloads(45)

LTDNet: A Lightweight Text Detector for Real-Time Arbitrary-Shape Traffic Text Detection

doi: 10.1109/JAS.2024.125022

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content