[1] ZHANG C S, TAO Y F, DU K, et al. Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving[J]. IEEE Transactions on Artificial Intelligence, 2022, 3(2): 297-308.
[2] 刘成林, 金连文, 白翔, 等. 文档智能分析与识别前沿: 回顾与展望[J]. 中国图象图形学报, 2023, 28(8): 2223-2252.
LIU C L, JIN L W, BAI X, et al. Frontiers of intelligent document analysis and recognition: review and prospects[J]. Journal of Image and Graphics, 2023, 28(8): 2223-2252.
[3] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
LIU Y J, YI X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
[4] LIAO M, SHI B, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
[5] NEUMANN L, MATAS J. Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1872-1885.
[6] QIAO L, TANG S L, CHENG Z Z, et al. Text perceptron: towards end-to-end arbitrary-shaped text spotting[J]. arXiv: 2002.06820, 2020.
[7] ZHANG X, SU Y W, TRIPATHI S, et al. Text spotting transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 9509-9518.
[8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[9] KITTENPLON Y, LAVI I, FOGEL S, et al. Towards weakly-supervised text spotting using a multi-task transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4594-4603.
[10] XING L J, TIAN Z, HUANG W L, et al. Convolutional character networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9125-9135.
[11] LYU P Y, LIAO M H, YAO C, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 71-88.
[12] QIAO L, CHEN Y, CHENG Z Z, et al. MANGO: a mask attention guided one-stage scene text spotter[J]. arXiv:2012. 04350, 2020.
[13] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
[14] RAISI Z, NAIEL M A, YOUNES G, et al. Transformer-based text detection in the wild[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2021: 3156-3165.
[15] TANG J Q, ZHANG W Q, LIU H Y, et al. Few could be better than all: feature sampling and grouping for scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4553-4562.
[16] LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[J]. arXiv:2201. 12329, 2022.
[17] HUANG M X, LIU Y L, PENG Z H, et al. SwinTextSpotter: scene text spotting via better synergy between text detection and text recognition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4583-4593.
[18] 陈佐瓒, 徐兵, 丁小军, 等. 基于Encoder-Decoder框架的双监督机制自然场景文本识别[J]. 计算机工程与应用, 2022, 58(6): 128-133.
CHEN Z Z, XU B, DING X J, et al. Natural scene text recognition based on encoder-decoder framework with dual supervision mechanism[J]. Computer Engineering and Applications, 2022, 58(6): 128-133.
[19] WANG K, BABENKO B, BELONGIE S. End-to-end scene text recognition[C]//Proceedings of the 2011 International Conference on Computer Vision. Piscataway: IEEE, 2011: 1457-1464.
[20] BISSACCO A, CUMMINS M, NETZER Y, et al. PhotoOCR: reading text in uncontrolled conditions[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 785-792.
[21] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304.
[22] LI H, WANG P, SHEN C H. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5248-5256.
[23] LIU X B, LIANG D, YAN S, et al. FOTS: fast oriented text spotting with a unified network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5676-5685.
[24] LIAO M H, PANG G, HUANG J, et al. Mask TextSpotter v3: segmentation proposal network for robust scene text spotting[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 706-722.
[25] LIU Y L, CHEN H, SHEN C H, et al. ABCNet: real-time scene text spotting with adaptive Bezier-curve network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9806-9815.
[26] LIU Y, SHEN C, JIN L, et al. ABCNet v2: adaptive bezier-curve network for real-time end-to-end text spotting[J]. IEEE Transactions on Pattern Analysis And Machine Intelligence, 2022, 44(11): 8048-8064.
[27] ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection[J]. arXiv:2010. 04159, 2020.
[28] XUE C, ZHANG W, HAO Y, et al. Language matters: a weakly supervised vision-language pre-training approach for scene text detection and spotting[C]//Proceedings of the European Conference on Computer Vision, 2022: 284-302.
[29] SONG S B, WAN J Q, YANG Z B, et al. Vision-language pre-training for boosting scene text detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 15660-15670.
[30] WAN Q, JI H Q, SHEN L L. Self-attention based text knowledge mining for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 5979-5988.
[31] DONG Q, TU Z W, LIAO H F, et al. Visual relationship detection using part-and-sum transformers with composite queries[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3530-3539.
[32] PENG S D, JIANG W, PI H J, et al. Deep snake for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 8530-8539.
[33] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007.
[34] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666.
[35] CH’NG C K, CHAN C S, LIU C L. Total-Text: toward orientation robustness in scene text detection[J]. International Journal on Document Analysis and Recognition, 2020, 23(1): 31-52.
[36] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 12th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2013: 1484-1493.
[37] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2015: 1156-1160.
[38] LIU Y L, JIN L W, ZHANG S T, et al. Curved scene text detection via transverse and longitudinal sequence connection[J]. Pattern Recognition, 2019, 90: 337-345.
[39] YE M Y, ZHANG J, ZHAO S S, et al. DPText-DETR: towards better scene text detection with dynamic points in transformer[J]. arXiv:2207.04491, 2022.
[40] YE M Y, ZHANG J, ZHAO S S, et al. DeepSolo: let transformer decoder with explicit points solo for text spotting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 19348-19357.
[41] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944.
[42] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[J]. arXiv:1711.05101, 2017.
[43] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20.
[44] FENG W, HE W H, YIN F, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9075-9084.
[45] BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9357-9366. |