计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (4): 52-63.DOI: 10.3778/j.issn.1002-8331.2106-0411
刘艳菊,伊鑫海,李炎阁,张惠玉,刘彦忠
出版日期:
2022-02-15
发布日期:
2022-02-15
LIU Yanju, YI Xinhai, LI Yange, ZHANG Huiyu, LIU Yanzhong
Online:
2022-02-15
Published:
2022-02-15
摘要: 随着深度学习技术在计算机视觉领域的发展,场景文本检测与文字识别技术也有了突破性的进展。受到自然场景下极端光照、遮挡、模糊、多方向多尺度等情况的影响,无约束的场景文本检测与识别仍然面临着巨大的挑战。从深度学习的角度对场景文本检测和文字识别技术进行深入研究,总结出在文本检测技术中将基于分割的方法与回归的方法优势相结合,可以解决小文本区域的召回率较低的问题,同时适应多尺度文本;在文本识别方法中将CTC机制与Attention机制相结合,可以相互监督以提升识别性能,降低长文本识别的出错率。
刘艳菊, 伊鑫海, 李炎阁, 张惠玉, 刘彦忠. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
LIU Yanju, YI Xinhai, LI Yange, ZHANG Huiyu, LIU Yanzhong. Application of Scene Text Recognition Technology Based on Deep Learning:A Survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
[1] 王润民,桑农,丁丁,等.自然场景图像中的文本检测综述[J].自动化学报,2018,44(12):2113-2141. WANG R M,SANG N,DING D,et al.Text detection in natural scene image:a survey[J].Acta Automatica Sinica,2018,44(12):2113-2141. [2] RADWAN M A,KHALIL M I,ABBAS H M.Neural networks pipeline for offline machine printed Arabic OCR[J].Neural Processing Letters,2018,48(2):769-787. [3] 王德青,吾守尔·斯拉木,许苗苗.场景文字识别技术研究综述[J].计算机工程与应用,2020,56(18):1-15. WANG D Q,Wushouer[·]Silamu,XU M M.Review of research on scene text recognition technology[J].Computer Engineering and Applications,2020,56(18):1-15. [4] 姜维,张重生,殷绪成.基于深度学习的场景文字检测综述[J].电子学报,2019,47(5):1152-1161. JIANG W,ZHANG C S,YIN X C.Deep learning based scene text detection:a survey[J].Acta Electronica Sinica,2019,47(5):1152-1161. [5] 金连文,钟卓耀,杨钊,等.深度学习在手写汉字识别中的应用综述[J].自动化学报,2016,42(8):1125-1141. JIN L W,ZHONG Z Y,YANG Z,et al.Applications of deep learning for handwritten Chinese character recognition:a review[J].Acta Automatica Sinica,2016,42(8):1125-1141. [6] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:2315-2324. [7] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440. [8] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:779-788. [9] LIU Y,JIN L.Deep matching prior network:toward tighter multi-oriented text detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:1962-1969. [10] LIU Y,ZHANG S,JIN L,et al.Omnidirectional scene text detection with sequential-free box discretization[J].arXiv:1906.02371,2019. [11] LIAO M,SHI B,BAI X,et al.TextBoxes:a fast text detector with a single deep neural network[C]//31st 2017 AAAI Conference on Artificial Intelligence,2017. [12] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//14th European Conference on Computer Vision.Cham:Springer,2016:21-37. [13] LIAO M,SHI B,BAI X.TextBoxes++:a single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690. [14] ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:5551-5560. [15] WANG Y,XIE H,ZHA Z,et al.R-Net:a relationship network for efficient and accurate scene text detection[J].IEEE Transactions on Multimedia,2020,23:1316-1329. [16] SHI B,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:2550-2558. [17] TANG J,YANG Z,WANG Y,et al.SegLink++:detecting dense and arbitrary-shaped scene text by instance-aware component grouping[J].Pattern Recognition,2019,96:106954. [18] MA C,SUN L,ZHONG Z,et al.ReLaText:exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[J].Pattern Recognition,2021,111:107684. [19] XIAO L,ZHOU P,XU K,et al.Multi-directional scene text detection based on improved YOLOv3[J].Sensors,2021,21(14):4870. [20] LYU P,LIAO M,YAO C,et al.Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//15th European Conference on Computer Vision,2018:67-83. [21] GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision,2015:1440-1448. [22] ZHANG C,LIANG B,HUANG Z,et al.Look more than once:an accurate detector for text of arbitrary shapes[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:10552-10561. [23] XUE C,LU S,ZHANG W.MSR:multi-scale shape regression for scene text detection[J].arXiv:1901.02596,2019. [24] LI Y,QI H Z,DAI J F,et al.Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,Honolulu,2017:4438. [25] LONG S,RUAN J,ZHANG W,et al.TextSnake:a flexible representation for detecting text of arbitrary shapes[C]//15th European Conference on Computer Vision,2018:20-36. [26] XIE Z,HUANG Y,ZHU Y,et al.Aggregation cross-entropy for sequence recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:6538-6547. [27] WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:9336-9345. [28] WANG W,XIE E,SONG X,et al.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:8440-8449. [29] ZHU Y,DU J.TextMountain:accurate scene text detection via instance segmentation[J].Pattern Recognition,2021,110:107336. [30] LIAO M,WAN Z,YAO C,et al.Real-time scene text detection with differentiable binarization[C]//34th AAAI Conference on Artificial Intelligence,2020:11474-11481. [31] LIU J,LIU X,SHENG J,et al.Pyramid mask text detector[J].arXiv:1903.11800,2019. [32] HE K,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision,2017:2961-2969. [33] XIE E,ZANG Y,SHAO S,et al.Scene text detection with supervised pyramid context network[C]//33rd AAAI Conference on Artificial Intelligence,2019:9038-9045. [34] WANG Y,XIE H,ZHA Z J,et al.ContourNet:taking a further step toward accurate arbitrary-shaped scene text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11753-11762. [35] 颜建强.图像视频复杂场景中文字检测识别方法研究[D].西安:西安电子科技大学,2014. YAN J Q.Text detection and recognition in complex scene of image and video[D].Xi’an:Xidian University,2014. [36] 何树有.自然场景中文字识别关键技术研究[D].大连:大连理工大学,2017. HE S Y.Research on key technologies of character recognition in natural image[D].Dalian:Dalian University of Technology,2017. [37] 王建新,王子亚,田萱.基于深度学习的自然场景文本检测与识别综述[J].软件学报,2020,31(5):1465-1496. WANG J X,WANG Z Y,TIAN X.Review of natural scene text detection and recognition based on deep learning[J].Journal of Software,2020,31(5):1465-1496. [38] GRAVES A,FERNáNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//23rd International Conference on Machine Learning,2006:369-376. [39] HE P,HUANG W,QIAO Y,et al.Reading scene text in deep convolutional sequences[C]//30th AAAI Conference on Artificial Intelligence,2016. [40] GOODFELLOW I,WARDE-FARLEY D,MIRZA M,et al.Maxout networks[C]//30th International Conference on Machine Learning,2013:1319-1327. [41] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [42] SHI B,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. [43] JADERBERG M,SIMONYAN K,VEDALDI A,et al.Deep structured output learning for unconstrained text recognition[C]//3rd International Conference on Learning Representations,San Diego,May 7-9,2015. [44] JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems,2015:2017-2025. [45] BOOKSTEIN F L.Principal warps:thin-plate splines and the decomposition of deformations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,11(6):567-585. [46] SHI B,YANG M,WANG X,et al.ASTER:an attentional scene text recognizer with flexible rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2035-2048. [47] GRAVES A,LIWICKI M,FERNáNDEZ S,et al.A novel connectionist system for unconstrained handwriting recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,31(5):855-868. [48] LUO C,JIN L,SUN Z.MORAN:a multi-object rectified attention network for scene text recognition[J].Pattern Recognition,2019,90:109-118. [49] LIN Q,LUO C,JIN L,et al.STAN:a sequential transformation attention-based network for scene text recognition[J].Pattern Recognition,2021,111:107692. [50] CHENG Z,BAI F,XU Y,et al.Focusing attention:towards accurate text recognition in natural images[C]//2017 IEEE International Conference on Computer Vision,2017:5076-5084. [51] WANG T,ZHU Y,JIN L,et al.Decoupled attention network for text recognition[C]//34th AAAI Conference on Artificial Intelligence,2020:12216-12224. [52] LU N,YU W,QI X,et al.MASTER:multi-aspect non-local network for scene text recognition[J].Pattern Recognition,2021,117:107980. [53] WANG C,LIU C L.Multi-branch guided attention network for irregular text recognition[J].Neurocomputing,2021,425:278-289. [54] LITMAN R,ANSCHEL O,TSIPER S,et al.SCATTER:selective context attentional scene text recognizer[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11962-11972. [55] HU W,CAI X,HOU J,et al.GTC:guided training of CTC towards efficient and accurate scene text recognition[C]//2020 AAAI Conference on Artificial Intelligence,2020:11005-11012. [56] QIAO Z,ZHOU Y,YANG D,et al.SEED:semantics enhanced encoder-decoder framework for scene text recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:13528-13537. [57] SUN Y,LIU J,LIU W,et al.Chinese street view text:large-scale Chinese text reading with partially supervised learning[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:9086-9095. [58] ZHANG Y,NIE S,LIU W,et al.Sequence-to-sequence domain adaptation network for robust text image recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:2740-2749. [59] 刘翦.开放环境下目标检测与识别算法研究——以极端光照环境下车牌识别为例[D].天津:天津理工大学,2020. LIU J.Research on target detection and recognition algorithm in open environment-take license plate recognition in extreme lighting environment as an example[D].Tianjin:Tianjin University of Technology,2020. [60] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [61] LIU Y,CHEN H,SHEN C,et al.ABCnet:real-time scene text spotting with adaptive Bezier-curve network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:9809-9818. [62] LIAO M,PANG G,HHUANG J,et al.Mask TextSpotter v3:segmentation proposal network for robust scene text spotting[C]//16th European Conference on Computer Vision,Glasgow,Aug 23-28,2020:706-722. [63] LIAO M,LYU P,HE M,et al.Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(2):532-548. [64] FENG W,HE W,YIN F,et al.TextDragon:an end-to-end framework for arbitrary shaped text spotting[C]//2019 IEEE/CVF International Conference on Computer Vision,2019:9076-9085. [65] REN S,HE K,GRISHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems,2015,28:91-99. [66] KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition[C]//2013 12th International Conference on Document Analysis and Recognition,2013:1484-1493. [67] KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition.Piscataway:IEEE,2015:1156-1160. [68] WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition[C]//2011 International Conference on Computer Vision.Piscataway:IEEE,2011:1457-1464. [69] LEE S H,CHO M S,JUNG K,et al.Scene text extraction with edge constraint and text collinearity[C]//2010 20th International Conference on Pattern Recognition,2010:3983-3986. [70] YAO C,BAI X,LIU W,et al.Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition,2012:1083-1090. [71] YI C,TIAN Y L.Text string detection from natural scenes by structure-based partition and grouping[J].IEEE Transactions on Image Processing,2011,20(9):2594-2605. [72] VEIT A,MATERA T,NEUMANN L,et al.COCO-text:dataset and benchmark for text detection and recognition in natural images[J].arXiv:1601.07140,2016. [73] LIU Y L,JIN L W,ZHANG S T,et al.Detecting curve text in the wild:new dataset and new solution[J].arXiv:1712.02170,2017. [74] NAYEF N,YIN F,BIZID I,et al.ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT[C]//2017 14th IAPR International Conference on Document Analysis and Recognition,2017:1454-1459. [75] NAYEF N,PATEL Y,BUSTA M,et al.ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019[C]//2019 International Conference on Document Analysis and Recognition,2019:1582-1587. [76] HASSAN H,El-MAHDY A,HUSSEIN M E.Arabic scene text recognition in the deep learning era:analysis on a novel dataset[J].IEEE Access,2021,9:107046-107058. [77] SUN Y,NI Z,CHNG C K,et al.ICDAR 2019 competition on large-scale street view text with partial labeling—RRC-LSVT[C]//2019 International Conference on Document Analysis and Recognition,2019:1557-1562. [78] ZHANG R,ZHOU Y,JIANG Q,et al.ICDAR 2019 robust reading challenge on reading Chinese text on signboard[C]//2019 International Conference on Document Analysis and Recognition,2019:1577-1581. [79] YUAN T L,ZHU Z,XU K,et al.A large Chinese text dataset in the wild[J].Journal of Computer Science and Technology,2019,34(3):509-521. [80] ZHANG C,DING W,PENG G,et al.Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4727-4743. |
[1] | 刘佳, 卞方舟, 陈大鹏, 李为斌. 基于UGF-Net的指尖检测模型[J]. 计算机工程与应用, 2022, 58(5): 225-231. |
[2] | 张振伟, 郝建国, 黄健, 潘崇煜. 小样本图像目标检测研究综述[J]. 计算机工程与应用, 2022, 58(5): 1-11. |
[3] | 卢冰洁, 李炜卓, 那崇宁, 牛作尧, 陈奎. 机器学习模型在车险欺诈检测的研究进展[J]. 计算机工程与应用, 2022, 58(5): 34-49. |
[4] | 邱叶, 邵雄凯, 高榕, 王春枝, 李晶. 基于注意力门控神经网络的社会化推荐算法[J]. 计算机工程与应用, 2022, 58(5): 112-118. |
[5] | 赵宏, 傅兆阳, 赵凡. 基于BERT和层次化Attention的微博情感分析研究[J]. 计算机工程与应用, 2022, 58(5): 156-162. |
[6] | 贺宇哲, 何宁, 张人, 梁煜博, 刘晓晓. 面向深度学习目标检测模型训练不平衡研究[J]. 计算机工程与应用, 2022, 58(5): 172-178. |
[7] | 关立文, 孙鑫磊, 杨佩. 基于关键点估计的抓取检测算法[J]. 计算机工程与应用, 2022, 58(4): 267-274. |
[8] | 陈智丽, 高皓, 潘以轩, 邢风. 乳腺X线图像计算机辅助诊断技术综述[J]. 计算机工程与应用, 2022, 58(4): 1-21. |
[9] | 郭迎春, 张萌, 郝小可. 内容感知的图像重定向方法综述[J]. 计算机工程与应用, 2022, 58(4): 22-39. |
[10] | 何珊, 袁家斌, 陆要要. 基于中文发音视觉特点的唇语识别方法研究[J]. 计算机工程与应用, 2022, 58(4): 157-162. |
[11] | 潘慧, 段先华, 罗斌强. 多尺度特征DCA融合的海上船舶检测算法研究[J]. 计算机工程与应用, 2022, 58(4): 177-185. |
[12] | 许学添, 蔡跃新. 基于图卷积网络的运动想象识别[J]. 计算机工程与应用, 2022, 58(4): 186-191. |
[13] | 李雷霆, 武光利, 郭振洲. 自注意力机制和随机森林回归的视频摘要生成[J]. 计算机工程与应用, 2022, 58(4): 198-205. |
[14] | 郑凤仙, 王夏黎, 何丹丹, 李妮妮, 付阳阳, 袁绍欣. 单幅图像去雾算法研究综述[J]. 计算机工程与应用, 2022, 58(3): 1-14. |
[15] | 厍向阳, 李蕊心, 叶鸥. 融合随机擦除和残差注意力网络的行人重识别[J]. 计算机工程与应用, 2022, 58(3): 215-221. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||