计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (8): 148-156.DOI: 10.3778/j.issn.1002-8331.2112-0108

• 模式识别与人工智能 • 上一篇    下一篇

文本中心像素重建实现任意形状的文本检测

雷小唐,胡靖   

  1. 成都信息工程大学 计算机学院,成都 610000
  • 出版日期:2023-04-15 发布日期:2023-04-15

Text Center Pixel Reconstruction to Achieve Efficient Arbitrary Shape Text Detection

LEI Xiaotang, HU Jing   

  1. School of Computer Science, Chengdu University of Information Technology, Chengdu 610000, China
  • Online:2023-04-15 Published:2023-04-15

摘要: 针对自然场景文本检测算法未能高效、准确地实现端到端的任意形状文本检测,提出了轻量型像素聚类文本核重建的文本检测算法,针对轻量型网络特征信息弱和感受野小的问题,设计了图像级上下文信息模块(image-level context module)来捕获全局图像信息和语义级上下文信息模块(semantic-level context module)学习目标区域信息,两者信息融合增强网络特征信息保证检测的准确性,为了有效区分相邻文本和定位弯曲文本,基于文本核启发将文字实例中心视为聚类中心,从核中心经过一次像素聚类重建完整的文字实例实现对任意形状文本的检测。方法在弯曲文本数据集Total-Text和CTW1500综合评分达到了84.1%和84.6%超过了最好的CARFT方法,检测速度42帧/s超过最优EAST的,有效地解决了检测形状文本的高效和准确性,在应用层面更加友好。

关键词: 图像级上下文, 语义级上下文, 像素聚类, 任意形状文本, 文本核

Abstract: In order to solve that the scene text detection algorithm is not efficient and accurate to realize the end-to-end arbitrary shape text detection, a text detection algorithm based on lightweight pixel clustering text kernel reconstruction is proposed. To address the weak feature information and small sensing field of lightweight network, the image-level context module is designed to capture the global image information and the semantic-level context module is designed to learn the target region information. The fusion of the two information enhances the network feature information to ensure the accuracy of detection. In order to distinguish the adjacent text and locate the curved text effectively, the text instance center is regarded as the clustering center based on the text kernel heuristic, and the detection of arbitrary shape text is realized through a pixel clustering reconstruction of the complete text instance from the core center. The method achieves a comprehensive score of 84.1% and 84.6% in the Total-Text and CTW1500 curved text datasets, and the detection speed of 42 frame/s exceeds that of the optimal EAST, which effectively solves the problem of high efficiency and accuracy in the detection of shape text.

Key words: image-level context, semantic-level context, pixel aggregation, arbitrary shape text, kenerl method