A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 8, pp. 1–14, Aug. 2025. doi: 10.1109/JAS.2024.125022
Citation: R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 8, pp. 1–14, Aug. 2025. doi: 10.1109/JAS.2024.125022

LTDNet: A Lightweight Text Detector for Real-Time Arbitrary-Shape Traffic Text Detection

doi: 10.1109/JAS.2024.125022
Funds:  This work was supported in part by the National Natural Science Foundation of China (61502164, 62176097), the Natural Science Foundation of Hunan Province (2020JJ4057), the Key Research and Development Program of Changsha Science and Technology Bureau (kq2004050), and the Scientific Research Foundation of the Education Department of Hunan Province of China (21A0052)
More Information
  • Traffic text detection plays a vital role in understanding traffic scenes. Traffic text, a distinct subset of natural scene text, faces specific challenges not found in natural scene text detection, including false alarms from non-traffic text sources, such as roadside advertisements and building signs. Existing state-of-the-art methods employ increasingly complex detection frameworks to pursue higher accuracy, leading to challenges with real-time performance. In response to this issue, we propose a real-time and efficient traffic text detector named LTDNet, which strikes a balance between accuracy and real-time capabilities. LTDNet integrates three essential techniques to address these challenges effectively. First, a cascaded multilevel feature fusion network is employed to mitigate the limitations of lightweight backbone networks, thereby enhancing detection accuracy. Second, a lightweight feature attention module is introduced to enhance inference speed without compromising accuracy. Finally, a novel point-to-edge distance vector loss function is proposed to precisely localize text instance boundaries within traffic contexts. The superiority of our method is validated through extensive experiments on five publicly available datasets, demonstrating its state-of-the-art performance. The code will be released at https://github.com/runminwang/LTDNet.

     

  • loading
  • [1]
    J. Zhang, Y. Lv, J. Tao, F. Huang, and J. Zhang, “A robust real-time anchor-free traffic sign detector with one-level feature,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 8, no. 2, pp. 1437–1451, Apr. 2024. doi: 10.1109/TETCI.2024.3349464
    [2]
    J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, “CCTSDB 2021: A more comprehensive traffic sign detection benchmark,” Human-Centric Comput. Infor. Sci., vol. 12, p. 23, May 2022.
    [3]
    J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, and C. S. Lai, “Vehicle-mounted adaptive traffic sign detector for small-sized signs in multiple working conditions,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 1, pp. 710–724, Jan. 2024. doi: 10.1109/TITS.2023.3309644
    [4]
    R. Wang, J. Hei, M. Liu, Z. Wan, J. Xu, X. Cao, X. He, Y. Ding, C. Gao, and N. Sang, “CR2-Net: Component relationship reasoning network for traffic text detection,” IEEE Trans. Intell. Veh., pp. 1–15, 2024. doi: 10.1109/TIV.2024.3411093
    [5]
    R. Bagi, T. Dutta, N. Nigam, D. Verma, and H. P. Gupta, “Met-MLTS: Leveraging smartphones for end-to-end spotting of multilingual oriented scene texts and traffic signs in adverse meteorological conditions,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 12801–12810, Aug. 2022. doi: 10.1109/TITS.2021.3117793
    [6]
    D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, “ICDAR 2015 competition on robust reading,” in Proc. 13th Int. Conf. Document Analysis and Recognition, Tunis, Tunisia, 2015, pp. 1156–1160.
    [7]
    W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 9328–9337.
    [8]
    M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Process., vol. 27, no. 8, pp. 3676–3690, Aug. 2018. doi: 10.1109/TIP.2018.2825107
    [9]
    P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, “Single shot text detector with regional attention,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 3066–3074.
    [10]
    M. Liao, Z. Zhu, B. Shi, G.-S. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 5909–5918.
    [11]
    S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “TextSnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 19–35.
    [12]
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: An efficient and accurate scene text detector,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2642–2651.
    [13]
    Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 11750–11759.
    [14]
    S. Liu, Y. Xian, H. Li, and Z. Yu, “Text detection in natural scene images using morphological component analysis and Laplacian dictionary,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 214–222, Jan. 2020. doi: 10.1109/JAS.2017.7510427
    [15]
    W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 8439–8448.
    [16]
    M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. 34th AAAI Conf. on Artificial Intelligence, New York, USA, 2020, pp. 11474–11481.
    [17]
    W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5349–5367, Sep. 2022.
    [18]
    M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-time scene text detection with differentiable binarization and adaptive scale fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 919–931, Jan. 2023. doi: 10.1109/TPAMI.2022.3155612
    [19]
    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 886–893.
    [20]
    D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. 7th IEEE Int. Conf. Computer Vision, Kerkyra, Greece, 1999, pp. 1150–1157.
    [21]
    J. Zhang, W. Feng, T. Yuan, J. Wang, and A. K. Sangaiah, “SCSTCF: Spatial-channel selection and temporal regularized correlation filters for visual tracking,” Appl. Soft Comput., vol. 118, p. 108485, Mar. 2022. doi: 10.1016/j.asoc.2022.108485
    [22]
    S. Kan, Y. Cen, Z. He, Z. Zhang, L. Zhang, and Y. Wang, “Supervised deep feature embedding with handcrafted feature,” IEEE Trans. Image Process., vol. 28, no. 12, pp. 5809–5823, Dec. 2019. doi: 10.1109/TIP.2019.2901407
    [23]
    M. Cao, C. Zhang, D. Yang, and Y. Zou, “All you need is a second look: Towards arbitrary-shaped text detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 758–767, Feb. 2022. doi: 10.1109/TCSVT.2021.3068133
    [24]
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
    [25]
    Q. Wan, H. Ji, and L. Shen, “Self-attention based text knowledge mining for text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 5979–5988.
    [26]
    J. Greenhalgh and M. Mirmehdi, “Recognizing text-based traffic signs,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 3, pp. 1360–1369, Jun. 2015. doi: 10.1109/TITS.2014.2363167
    [27]
    X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Proc. European Computer Vision, Milan, Italy, 2016, pp. 109–121.
    [28]
    Y. Zhu, M. Liao, M. Yang, and W. Liu, “Cascaded segmentation-detection networks for text-based traffic sign detection,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, pp. 209–219, Jan. 2018. doi: 10.1109/TITS.2017.2768827
    [29]
    Z. Zuo and P. Yang, “A traffic sign text detection system for pratical natural scenes,” in Proc. IEEE 24th Int. Conf. Parallel and Distributed Systems, Singapore, Singapore, 2018, pp. 1069–1074.
    [30]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
    [31]
    X. He, R. Wang, X. Li, X. Chen, C. Guo, L. Wen, C. Gao, and L. Liu, “HTSTL: Head-and-tail search network with scale-transfer layer for traffic sign text detection,” IEEE Access, vol. 7, pp. 118333–118342, Aug. 2019.
    [32]
    J.-B. Hou, X. Zhu, C. Liu, C. Yang, L.-H. Wu, H. Wang, and X.-C. Yin, “Detecting text in scene and traffic guide panels with attention anchor mechanism,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 11, pp. 6890–6899, Nov. 2021. doi: 10.1109/TITS.2020.2996027
    [33]
    M. Liang, X. Zhu, H. Zhou, J. Qin, and X.-C. Yin, “HFENet: Hybrid feature enhancement network for detecting texts in scenes and traffic panels,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 12, pp. 14200–14212, Dec. 2023. doi: 10.1109/TITS.2023.3305686
    [34]
    S. Khalid, J. H. Shah, M. Sharif, F. Dahan, R. Saleem, and A. Masood, “A robust intelligent system for text-based traffic signs detection and recognition in challenging weather conditions,” IEEE Access, vol. 12, pp. 78261–78274, May 2024.
    [35]
    X. He, Z. Li, J. Lin, K. Nai, J. Yuan, Y. Li, and R. Wang, “Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes,” Comput Vis. Image Underst., vol. 233, p. 103709, Aug. 2023. doi: 10.1016/j.cviu.2023.103709
    [36]
    M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. 31th AAAI Conf. Artificial Intelligence, San Francisco, USA, 2017, pp. 4161–4167.
    [37]
    J. Li, Y. Lin, R. Liu, C. M. Ho, and H. Shi, “RSCA: Real-time segmentation-based context-aware scene text detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Nashville, USA, 2021, pp. 2349–2358.
    [38]
    Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu, and H. Chen, “ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8048–8064, Nov. 2022.
    [39]
    X. Qin, P. Lyu, C. Zhang, Y. Zhou, K. Yao, P. Zhang, H. Lin, and W. Wang, “Towards robust real-time scene text detection: From semantic to instance representation learning,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, p. 2025–2034.
    [40]
    Y. Zhao, Y. Cai, W. Wu, and W. Wang, “Explore faster localization learning for scene text detection,” in Proc. IEEE Int. Conf. Multimedia and Expo, Brisbane, Australia, 2023, pp. 156–161.
    [41]
    Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” arXiv preprint arXiv: 1508.01991, 2015.
    [42]
    F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in Proc. 4th Int. Conf. 3D Vision, Stanford, USA, 2016, pp. 565–571.
    [43]
    A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 761–769.
    [44]
    Y. Chen and L. Huang, “Chinese traffic panels detection and recognition from street-level images,” MATEC Web Conf., vol. 42, p. 06001, Feb. 2016. doi: 10.1051/matecconf/20164206001
    [45]
    Y. Liu, L. Jin, S. Zhang, and S. Zhang, “Detecting curve text in the wild: New dataset and new solution,” arXiv preprint arXiv: 1712.02170, 2017.
    [46]
    A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 2315–2324.
    [47]
    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
    [48]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2014, pp. 1–15.
    [49]
    J. Wang, S. Wu, H. Zhang, B. Yuan, C. Dai, and N. R. Pal, “Universal approximation abilities of a modular differentiable neural network,” IEEE Trans. Neural Netw. Learn. Syst., 2024, DOI: 10.1109/TNNLS.2024.3378697.
    [50]
    J. Zhang, Y. He, W. Chen, L.-D. Kuang, and B. Zheng, “CorrFormer: Context-aware tracking with cross-correlation and transformer,” Comput. Electr. Eng., vol. 114, p. 109075, Mar. 2024. doi: 10.1016/j.compeleceng.2024.109075
    [51]
    Y. Zhang, T. Zhang, C. Wu, and R. Tao, “Multi-scale spatiotemporal feature fusion network for video saliency prediction,” IEEE Trans. Multimedia, vol. 26, pp. 4183–4193, Oct. 2024.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(4)

    Article Metrics

    Article views (189) PDF downloads(45) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return