Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (4): 52-63.DOI: 10.3778/j.issn.1002-8331.2106-0411
• Research Hotspots and Reviews • Previous Articles Next Articles
LIU Yanju, YI Xinhai, LI Yange, ZHANG Huiyu, LIU Yanzhong
Online:
2022-02-15
Published:
2022-02-15
刘艳菊,伊鑫海,李炎阁,张惠玉,刘彦忠
LIU Yanju, YI Xinhai, LI Yange, ZHANG Huiyu, LIU Yanzhong. Application of Scene Text Recognition Technology Based on Deep Learning:A Survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
刘艳菊, 伊鑫海, 李炎阁, 张惠玉, 刘彦忠. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2106-0411
[1] 王润民,桑农,丁丁,等.自然场景图像中的文本检测综述[J].自动化学报,2018,44(12):2113-2141. WANG R M,SANG N,DING D,et al.Text detection in natural scene image:a survey[J].Acta Automatica Sinica,2018,44(12):2113-2141. [2] RADWAN M A,KHALIL M I,ABBAS H M.Neural networks pipeline for offline machine printed Arabic OCR[J].Neural Processing Letters,2018,48(2):769-787. [3] 王德青,吾守尔·斯拉木,许苗苗.场景文字识别技术研究综述[J].计算机工程与应用,2020,56(18):1-15. WANG D Q,Wushouer[·]Silamu,XU M M.Review of research on scene text recognition technology[J].Computer Engineering and Applications,2020,56(18):1-15. [4] 姜维,张重生,殷绪成.基于深度学习的场景文字检测综述[J].电子学报,2019,47(5):1152-1161. JIANG W,ZHANG C S,YIN X C.Deep learning based scene text detection:a survey[J].Acta Electronica Sinica,2019,47(5):1152-1161. [5] 金连文,钟卓耀,杨钊,等.深度学习在手写汉字识别中的应用综述[J].自动化学报,2016,42(8):1125-1141. JIN L W,ZHONG Z Y,YANG Z,et al.Applications of deep learning for handwritten Chinese character recognition:a review[J].Acta Automatica Sinica,2016,42(8):1125-1141. [6] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:2315-2324. [7] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440. [8] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:779-788. [9] LIU Y,JIN L.Deep matching prior network:toward tighter multi-oriented text detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:1962-1969. [10] LIU Y,ZHANG S,JIN L,et al.Omnidirectional scene text detection with sequential-free box discretization[J].arXiv:1906.02371,2019. [11] LIAO M,SHI B,BAI X,et al.TextBoxes:a fast text detector with a single deep neural network[C]//31st 2017 AAAI Conference on Artificial Intelligence,2017. [12] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//14th European Conference on Computer Vision.Cham:Springer,2016:21-37. [13] LIAO M,SHI B,BAI X.TextBoxes++:a single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690. [14] ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:5551-5560. [15] WANG Y,XIE H,ZHA Z,et al.R-Net:a relationship network for efficient and accurate scene text detection[J].IEEE Transactions on Multimedia,2020,23:1316-1329. [16] SHI B,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:2550-2558. [17] TANG J,YANG Z,WANG Y,et al.SegLink++:detecting dense and arbitrary-shaped scene text by instance-aware component grouping[J].Pattern Recognition,2019,96:106954. [18] MA C,SUN L,ZHONG Z,et al.ReLaText:exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[J].Pattern Recognition,2021,111:107684. [19] XIAO L,ZHOU P,XU K,et al.Multi-directional scene text detection based on improved YOLOv3[J].Sensors,2021,21(14):4870. [20] LYU P,LIAO M,YAO C,et al.Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//15th European Conference on Computer Vision,2018:67-83. [21] GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision,2015:1440-1448. [22] ZHANG C,LIANG B,HUANG Z,et al.Look more than once:an accurate detector for text of arbitrary shapes[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:10552-10561. [23] XUE C,LU S,ZHANG W.MSR:multi-scale shape regression for scene text detection[J].arXiv:1901.02596,2019. [24] LI Y,QI H Z,DAI J F,et al.Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,Honolulu,2017:4438. [25] LONG S,RUAN J,ZHANG W,et al.TextSnake:a flexible representation for detecting text of arbitrary shapes[C]//15th European Conference on Computer Vision,2018:20-36. [26] XIE Z,HUANG Y,ZHU Y,et al.Aggregation cross-entropy for sequence recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:6538-6547. [27] WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:9336-9345. [28] WANG W,XIE E,SONG X,et al.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:8440-8449. [29] ZHU Y,DU J.TextMountain:accurate scene text detection via instance segmentation[J].Pattern Recognition,2021,110:107336. [30] LIAO M,WAN Z,YAO C,et al.Real-time scene text detection with differentiable binarization[C]//34th AAAI Conference on Artificial Intelligence,2020:11474-11481. [31] LIU J,LIU X,SHENG J,et al.Pyramid mask text detector[J].arXiv:1903.11800,2019. [32] HE K,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision,2017:2961-2969. [33] XIE E,ZANG Y,SHAO S,et al.Scene text detection with supervised pyramid context network[C]//33rd AAAI Conference on Artificial Intelligence,2019:9038-9045. [34] WANG Y,XIE H,ZHA Z J,et al.ContourNet:taking a further step toward accurate arbitrary-shaped scene text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11753-11762. [35] 颜建强.图像视频复杂场景中文字检测识别方法研究[D].西安:西安电子科技大学,2014. YAN J Q.Text detection and recognition in complex scene of image and video[D].Xi’an:Xidian University,2014. [36] 何树有.自然场景中文字识别关键技术研究[D].大连:大连理工大学,2017. HE S Y.Research on key technologies of character recognition in natural image[D].Dalian:Dalian University of Technology,2017. [37] 王建新,王子亚,田萱.基于深度学习的自然场景文本检测与识别综述[J].软件学报,2020,31(5):1465-1496. WANG J X,WANG Z Y,TIAN X.Review of natural scene text detection and recognition based on deep learning[J].Journal of Software,2020,31(5):1465-1496. [38] GRAVES A,FERNáNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//23rd International Conference on Machine Learning,2006:369-376. [39] HE P,HUANG W,QIAO Y,et al.Reading scene text in deep convolutional sequences[C]//30th AAAI Conference on Artificial Intelligence,2016. [40] GOODFELLOW I,WARDE-FARLEY D,MIRZA M,et al.Maxout networks[C]//30th International Conference on Machine Learning,2013:1319-1327. [41] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [42] SHI B,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. [43] JADERBERG M,SIMONYAN K,VEDALDI A,et al.Deep structured output learning for unconstrained text recognition[C]//3rd International Conference on Learning Representations,San Diego,May 7-9,2015. [44] JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems,2015:2017-2025. [45] BOOKSTEIN F L.Principal warps:thin-plate splines and the decomposition of deformations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,11(6):567-585. [46] SHI B,YANG M,WANG X,et al.ASTER:an attentional scene text recognizer with flexible rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2035-2048. [47] GRAVES A,LIWICKI M,FERNáNDEZ S,et al.A novel connectionist system for unconstrained handwriting recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,31(5):855-868. [48] LUO C,JIN L,SUN Z.MORAN:a multi-object rectified attention network for scene text recognition[J].Pattern Recognition,2019,90:109-118. [49] LIN Q,LUO C,JIN L,et al.STAN:a sequential transformation attention-based network for scene text recognition[J].Pattern Recognition,2021,111:107692. [50] CHENG Z,BAI F,XU Y,et al.Focusing attention:towards accurate text recognition in natural images[C]//2017 IEEE International Conference on Computer Vision,2017:5076-5084. [51] WANG T,ZHU Y,JIN L,et al.Decoupled attention network for text recognition[C]//34th AAAI Conference on Artificial Intelligence,2020:12216-12224. [52] LU N,YU W,QI X,et al.MASTER:multi-aspect non-local network for scene text recognition[J].Pattern Recognition,2021,117:107980. [53] WANG C,LIU C L.Multi-branch guided attention network for irregular text recognition[J].Neurocomputing,2021,425:278-289. [54] LITMAN R,ANSCHEL O,TSIPER S,et al.SCATTER:selective context attentional scene text recognizer[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11962-11972. [55] HU W,CAI X,HOU J,et al.GTC:guided training of CTC towards efficient and accurate scene text recognition[C]//2020 AAAI Conference on Artificial Intelligence,2020:11005-11012. [56] QIAO Z,ZHOU Y,YANG D,et al.SEED:semantics enhanced encoder-decoder framework for scene text recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:13528-13537. [57] SUN Y,LIU J,LIU W,et al.Chinese street view text:large-scale Chinese text reading with partially supervised learning[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:9086-9095. [58] ZHANG Y,NIE S,LIU W,et al.Sequence-to-sequence domain adaptation network for robust text image recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:2740-2749. [59] 刘翦.开放环境下目标检测与识别算法研究——以极端光照环境下车牌识别为例[D].天津:天津理工大学,2020. LIU J.Research on target detection and recognition algorithm in open environment-take license plate recognition in extreme lighting environment as an example[D].Tianjin:Tianjin University of Technology,2020. [60] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [61] LIU Y,CHEN H,SHEN C,et al.ABCnet:real-time scene text spotting with adaptive Bezier-curve network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:9809-9818. [62] LIAO M,PANG G,HHUANG J,et al.Mask TextSpotter v3:segmentation proposal network for robust scene text spotting[C]//16th European Conference on Computer Vision,Glasgow,Aug 23-28,2020:706-722. [63] LIAO M,LYU P,HE M,et al.Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(2):532-548. [64] FENG W,HE W,YIN F,et al.TextDragon:an end-to-end framework for arbitrary shaped text spotting[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:9076-9085. [65] REN S,HE K,GRISHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems,2015,28:91-99. [66] KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition[C]//2013 12th International Conference on Document Analysis and Recognition,2013:1484-1493. [67] KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition.Piscataway:IEEE,2015:1156-1160. [68] WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition[C]//2011 International Conference on Computer Vision.Piscataway:IEEE,2011:1457-1464. [69] LEE S H,CHO M S,JUNG K,et al.Scene text extraction with edge constraint and text collinearity[C]//2010 20th International Conference on Pattern Recognition,2010:3983-3986. [70] YAO C,BAI X,LIU W,et al.Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition,2012:1083-1090. [71] YI C,TIAN Y L.Text string detection from natural scenes by structure-based partition and grouping[J].IEEE Transactions on Image Processing,2011,20(9):2594-2605. [72] VEIT A,MATERA T,NEUMANN L,et al.COCO-text:dataset and benchmark for text detection and recognition in natural images[J].arXiv:1601.07140,2016. [73] LIU Y L,JIN L W,ZHANG S T,et al.Detecting curve text in the wild:new dataset and new solution[J].arXiv:1712.02170,2017. [74] NAYEF N,YIN F,BIZID I,et al.ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT[C]//2017 14th IAPR International Conference on Document Analysis and Recognition,2017:1454-1459. [75] NAYEF N,PATEL Y,BUSTA M,et al.ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019[C]//2019 International Conference on Document Analysis and Recognition,2019:1582-1587. [76] HASSAN H,El-MAHDY A,HUSSEIN M E.Arabic scene text recognition in the deep learning era:analysis on a novel dataset[J].IEEE Access,2021,9:107046-107058. [77] SUN Y,NI Z,CHNG C K,et al.ICDAR 2019 competition on large-scale street view text with partial labeling—RRC-LSVT[C]//2019 International Conference on Document Analysis and Recognition,2019:1557-1562. [78] ZHANG R,ZHOU Y,JIANG Q,et al.ICDAR 2019 robust reading challenge on reading Chinese text on signboard[C]//2019 International Conference on Document Analysis and Recognition,2019:1577-1581. [79] YUAN T L,ZHU Z,XU K,et al.A large Chinese text dataset in the wild[J].Journal of Computer Science and Technology,2019,34(3):509-521. [80] ZHANG C,DING W,PENG G,et al.Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4727-4743. |
[1] | ZHANG Zhenwei, HAO Jianguo, HUANG Jian, PAN Chongyu. Review of Few-Shot Object Detection [J]. Computer Engineering and Applications, 2022, 58(5): 1-11. |
[2] | LU Bingjie, LI Weizhuo, NA Chongning, NIU Zuoyao, CHEN Kui. Survey of Auto Insurance Fraud Detection with Machine Learning Models [J]. Computer Engineering and Applications, 2022, 58(5): 34-49. |
[3] | QIU Ye, SHAO Xiongkai, GAO Rong, WANG Chunzhi, LI Jing. Social Recommendation Algorithm Based on Attention Gated Neural Network [J]. Computer Engineering and Applications, 2022, 58(5): 112-118. |
[4] | ZHAO Hong, FU Zhaoyang, ZHAO Fan. Microblog Sentiment Analysis Based on BERT and Hierarchical Attention [J]. Computer Engineering and Applications, 2022, 58(5): 156-162. |
[5] | HE Yuzhe, HE Ning, ZHANG Ren, LIANG Yubo, LIU Xiaoxiao. Research on Imbalanced Training of Deep Learning Target Detection Model [J]. Computer Engineering and Applications, 2022, 58(5): 172-178. |
[6] | LIU Jia, BIAN Fangzhou, CHEN Dapeng, LI Weibin. Fingertip Detection Model Based on UGF-Net [J]. Computer Engineering and Applications, 2022, 58(5): 225-231. |
[7] | CHEN Zhili, GAO Hao, PAN Yixuan, XING Feng. Review of Computer Aided Diagnosis Technology in Mammography [J]. Computer Engineering and Applications, 2022, 58(4): 1-21. |
[8] | GUO Yingchun, ZHANG Meng, HAO Xiaoke. Review on Content-Aware Image Retargeting Methods [J]. Computer Engineering and Applications, 2022, 58(4): 22-39. |
[9] | HE Shan, YUAN Jiabin, LU Yaoyao. Research on Lip Reading Based on Visual Characteristics of Chinese Pronunciation [J]. Computer Engineering and Applications, 2022, 58(4): 157-162. |
[10] | PAN Hui, DUAN Xianhua, LUO Binqiang. Research on Marine Ship Detection Based on Multi-scale Feature Fusion and DCA [J]. Computer Engineering and Applications, 2022, 58(4): 177-185. |
[11] | XU Xuetian, CAI Yuexin. Motor Imagery Recognition Based on Graph Convolution Network [J]. Computer Engineering and Applications, 2022, 58(4): 186-191. |
[12] | LI Leiting, WU Guangli, GUO Zhenzhou. Video Summarization Generation Based on Self-attention Mechanism and Random Forest Regression [J]. Computer Engineering and Applications, 2022, 58(4): 198-205. |
[13] | GUAN Liwen, SUN Xinlei, YANG Pei. Grasping Detection Based on Key Point Estimation [J]. Computer Engineering and Applications, 2022, 58(4): 267-274. |
[14] | ZHENG Fengxian, WANG Xiali, HE Dandan, LI Nini, FU Yangyang, YUAN Shaoxin. Survey of Single Image Defogging Algorithm [J]. Computer Engineering and Applications, 2022, 58(3): 1-14. |
[15] | SHE Xiangyang, LI Ruixin, YE Ou. Pedestrian Re-identification Combining Random Erasing and Residual Attention Network [J]. Computer Engineering and Applications, 2022, 58(3): 215-221. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||