计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 228-236.DOI: 10.3778/j.issn.1002-8331.2301-0074

• 模式识别与人工智能 • 上一篇    下一篇

文本核重建与扩展实现任意形状文本检测

邓胜军,陈念年   

  1. 西南科技大学 计算机科学与技术学院,四川 绵阳 621010
  • 出版日期:2024-05-01 发布日期:2024-04-29

Text Kernel Reconstruction and Expansion for Arbitrary Shape Text Detection

DENG Shengjun, CHEN Niannian   

  1. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 基于分割的方法对自然场景中的文本进行像素级预测,大幅度提升了对任意形状文本的检测效果,但是如何有效分离相邻文本仍然是检测中的难题。目前广泛采用的方法是通过缩小文本注释边界得到文本核来分离相邻文本。然而,网络预测文本核时舍弃了文本核外大部分信息,降低了基于分割的文本检测方法的性能。为了解决这个问题,提出了一种文本核重建算法,将文本核的生成放在后处理阶段,通过网络预测的方向场将文本实例向内收缩形成文本核。同时,提出了一种文本核扩展算法用于将文本核恢复为完整的文本实例。实验表明,所提方法在Total-Text(88.66%)、CTW-1500(87.28%)和MSRA-TD500(90.65%)三个数据集上取得了相似或最好的检测性能。

关键词: 场景文本检测, 任意形状, 文本核

Abstract: Segmentation-based methods approaches for pixel-level text prediction in natural scenes have demonstrated significant improvement in the detection of arbitrary shape text. However, the separation of adjacent text remains a challenge in text detection. One common method for addressing this issue involves the use of text kernels, which are obtained by shrinking the annotation boundaries, to separate adjacent instances. While this approach is effective in certain scenarios, it discards a significant amount of information outside the text kernel, which can degrade the performance of segmentation-based text detection methods. To address this limitation, a text kernel reconstruction algorithm is proposed that postpones the generation of text kernels to the post-processing stage. The proposed approach utilizes the direction field predicted by the network to inwardly contract text instances, resulting in the formation of text kernels. Additionally, a text kernel expansion algorithm is proposed to restore full text instances from the resulting text kernels. Experiments on the Total-Text, CTW-1500, and MSRA-TD500 datasets show that the proposed method achieves similar or superior detection performance compared to the state-of-the-art (88.66%, 87.28%, and 90.65% respectively).

Key words: scene text detection, arbitrary shape, text kernel