
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (11): 176-184.DOI: 10.3778/j.issn.1002-8331.2403-0361
• Pattern Recognition and Artificial Intelligence • Previous Articles Next Articles
XU Shikang, LIU Junfeng, ZENG Jun, LIAO Dingding
Online:2025-06-01
Published:2025-05-30
徐诗康,刘俊峰,曾君,廖丁丁
XU Shikang, LIU Junfeng, ZENG Jun, LIAO Dingding. Scene Text Spotting Based on Cross-Modal and Circular Factorized Self-Attention[J]. Computer Engineering and Applications, 2025, 61(11): 176-184.
徐诗康, 刘俊峰, 曾君, 廖丁丁. 基于跨模态和循环分解自注意力的场景文本识别[J]. 计算机工程与应用, 2025, 61(11): 176-184.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2403-0361
| [1] ZHANG C S, TAO Y F, DU K, et al. Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving[J]. IEEE Transactions on Artificial Intelligence, 2022, 3(2): 297-308. [2] 刘成林, 金连文, 白翔, 等. 文档智能分析与识别前沿: 回顾与展望[J]. 中国图象图形学报, 2023, 28(8): 2223-2252. LIU C L, JIN L W, BAI X, et al. Frontiers of intelligent document analysis and recognition: review and prospects[J]. Journal of Image and Graphics, 2023, 28(8): 2223-2252. [3] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63. LIU Y J, YI X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63. [4] LIAO M, SHI B, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. [5] NEUMANN L, MATAS J. Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1872-1885. [6] QIAO L, TANG S L, CHENG Z Z, et al. Text perceptron: towards end-to-end arbitrary-shaped text spotting[J]. arXiv: 2002.06820, 2020. [7] ZHANG X, SU Y W, TRIPATHI S, et al. Text spotting transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 9509-9518. [8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010. [9] KITTENPLON Y, LAVI I, FOGEL S, et al. Towards weakly-supervised text spotting using a multi-task transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4594-4603. [10] XING L J, TIAN Z, HUANG W L, et al. Convolutional character networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9125-9135. [11] LYU P Y, LIAO M H, YAO C, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 71-88. [12] QIAO L, CHEN Y, CHENG Z Z, et al. MANGO: a mask attention guided one-stage scene text spotter[J]. arXiv:2012. 04350, 2020. [13] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229. [14] RAISI Z, NAIEL M A, YOUNES G, et al. Transformer-based text detection in the wild[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2021: 3156-3165. [15] TANG J Q, ZHANG W Q, LIU H Y, et al. Few could be better than all: feature sampling and grouping for scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4553-4562. [16] LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[J]. arXiv:2201. 12329, 2022. [17] HUANG M X, LIU Y L, PENG Z H, et al. SwinTextSpotter: scene text spotting via better synergy between text detection and text recognition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4583-4593. [18] 陈佐瓒, 徐兵, 丁小军, 等. 基于Encoder-Decoder框架的双监督机制自然场景文本识别[J]. 计算机工程与应用, 2022, 58(6): 128-133. CHEN Z Z, XU B, DING X J, et al. Natural scene text recognition based on encoder-decoder framework with dual supervision mechanism[J]. Computer Engineering and Applications, 2022, 58(6): 128-133. [19] WANG K, BABENKO B, BELONGIE S. End-to-end scene text recognition[C]//Proceedings of the 2011 International Conference on Computer Vision. Piscataway: IEEE, 2011: 1457-1464. [20] BISSACCO A, CUMMINS M, NETZER Y, et al. PhotoOCR: reading text in uncontrolled conditions[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 785-792. [21] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304. [22] LI H, WANG P, SHEN C H. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5248-5256. [23] LIU X B, LIANG D, YAN S, et al. FOTS: fast oriented text spotting with a unified network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5676-5685. [24] LIAO M H, PANG G, HUANG J, et al. Mask TextSpotter v3: segmentation proposal network for robust scene text spotting[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 706-722. [25] LIU Y L, CHEN H, SHEN C H, et al. ABCNet: real-time scene text spotting with adaptive Bezier-curve network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9806-9815. [26] LIU Y, SHEN C, JIN L, et al. ABCNet v2: adaptive bezier-curve network for real-time end-to-end text spotting[J]. IEEE Transactions on Pattern Analysis And Machine Intelligence, 2022, 44(11): 8048-8064. [27] ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection[J]. arXiv:2010. 04159, 2020. [28] XUE C, ZHANG W, HAO Y, et al. Language matters: a weakly supervised vision-language pre-training approach for scene text detection and spotting[C]//Proceedings of the European Conference on Computer Vision, 2022: 284-302. [29] SONG S B, WAN J Q, YANG Z B, et al. Vision-language pre-training for boosting scene text detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 15660-15670. [30] WAN Q, JI H Q, SHEN L L. Self-attention based text knowledge mining for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 5979-5988. [31] DONG Q, TU Z W, LIAO H F, et al. Visual relationship detection using part-and-sum transformers with composite queries[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3530-3539. [32] PENG S D, JIANG W, PI H J, et al. Deep snake for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 8530-8539. [33] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. [34] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. [35] CH’NG C K, CHAN C S, LIU C L. Total-Text: toward orientation robustness in scene text detection[J]. International Journal on Document Analysis and Recognition, 2020, 23(1): 31-52. [36] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 12th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2013: 1484-1493. [37] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2015: 1156-1160. [38] LIU Y L, JIN L W, ZHANG S T, et al. Curved scene text detection via transverse and longitudinal sequence connection[J]. Pattern Recognition, 2019, 90: 337-345. [39] YE M Y, ZHANG J, ZHAO S S, et al. DPText-DETR: towards better scene text detection with dynamic points in transformer[J]. arXiv:2207.04491, 2022. [40] YE M Y, ZHANG J, ZHAO S S, et al. DeepSolo: let transformer decoder with explicit points solo for text spotting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 19348-19357. [41] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. [42] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[J]. arXiv:1711.05101, 2017. [43] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20. [44] FENG W, HE W H, YIN F, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9075-9084. [45] BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9357-9366. |
| [1] | YANG Hongdan, FU Gui, SHAO Huichao, WANG Yixin, SHAO Yanhua, CHU Hongyu, DENG Hu. Small Object Detection in Aerial Imagery Using Multi-Scale Hiearchical Feature Fusion Based Approach [J]. Computer Engineering and Applications, 2025, 61(9): 230-241. |
| [2] | JIANG Wangyu, WANG Le, YAO Yepeng, MAO Guojun. Multi-Scale Feature Aggregation Diffusion and Edge Information Enhancement Small Object Detection Algorithm [J]. Computer Engineering and Applications, 2025, 61(7): 105-116. |
| [3] | XU Hang, WEN Xiaoke, WANG Wenjian. Partial Ordered Deep Forest Model Based on Feature Fusion [J]. Computer Engineering and Applications, 2025, 61(7): 165-175. |
| [4] | LU Min, HU Zhenyu. 3D Object Detection Method for Cooperative Vehicle Sensing Under Communication Delay [J]. Computer Engineering and Applications, 2025, 61(7): 278-287. |
| [5] | LIU Kui, TANG Huiping, SU Benyue. Gated Convolution and High-Frequency Feature Fusion for Infrared Small Target Detection [J]. Computer Engineering and Applications, 2025, 61(7): 306-314. |
| [6] | MA Yunyi, XU Ming, JIN Haibo. Multi-Channel Adaptive Feature Fusion for Urban Road Network Traffic Prediction [J]. Computer Engineering and Applications, 2025, 61(7): 334-341. |
| [7] | SHENG Wei, ZHOU Yongxia, CHEN Junjie, ZHAO Ping. Polarizer Surface Defect Detection Algorithm Based on YOLOv8-S [J]. Computer Engineering and Applications, 2025, 61(6): 128-140. |
| [8] | GUO Xiaoyu, MA Jing, CHEN Jie. Research on Multimodal Hierarchical Feature Mapping and Fusion Representation Method [J]. Computer Engineering and Applications, 2025, 61(6): 171-182. |
| [9] | WANG Yanni, HU Min, HAN Shipeng, CHEN Yixuan, LYU Hao. Human Pose Estimation with Multi-Scale and Multi-Level Feature Fusion [J]. Computer Engineering and Applications, 2025, 61(6): 199-209. |
| [10] | GONG Xiaomei, ZHANG Yi, HU Shu. Target Tracking Algorithm with Feature Fusion and Transformer Based Model Predictor [J]. Computer Engineering and Applications, 2025, 61(6): 254-262. |
| [11] | WANG Guoxiang, LI Changlong, SONG Junfeng, YE Zhen, JIN Heng. Image Depth Estimation Algorithm Incorporating Adaptive Sampling and Context-Aware Module [J]. Computer Engineering and Applications, 2025, 61(5): 261-268. |
| [12] | PAN Weilan, ZHANG Rongfen, LIU Yuhong, ZHANG Jiyou, SUN Long. Cross-Modal Transparent Object Segmentation Combining CNN-Transformer [J]. Computer Engineering and Applications, 2025, 61(4): 222-229. |
| [13] | ZHAO Lei, LI Dong. PMM-YOLO:Traffic Sign Detection Algorithm with Multi-Scale Feature Fusion [J]. Computer Engineering and Applications, 2025, 61(4): 262-271. |
| [14] | ZHANG Xiangsheng, CHENG Jiabao, GU Binjie. Object Detection of Depalletizing Box Based on Rotating Frame Location [J]. Computer Engineering and Applications, 2025, 61(4): 323-330. |
| [15] | LIAO Ningsheng, CAO Tianxiu, LIU Keyan, XU Meng, ZHU Mi, GU Yuxuan, WANG Pengfei. Small Target Detection Algorithm for UAV Based on Composite Feature and Multi-Scale Fusion [J]. Computer Engineering and Applications, 2025, 61(3): 111-120. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||