计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 229-236.DOI: 10.3778/j.issn.1002-8331.2209-0162

• 图形图像处理 • 上一篇    下一篇

食道病灶检测的多尺度细节增强金字塔网络

李驰,周颖玥,姚韩敏,李小霞,秦佳敏,庄鸣,文黎明   

  1. 1. 西南科技大学  信息工程学院,四川  绵阳  621010
    2. 特殊环境机器人技术四川省重点实验室,四川  绵阳  621010
    3. 四川绵阳四〇四医院,四川  绵阳  621053
  • 出版日期:2024-02-15 发布日期:2024-02-15

Multi-Scale Detail Enhanced Pyramid Network for Esophageal Lesion Detection

LI Chi, ZHOU Yingyue, YAO Hanmin, LI Xiaoxia, QIN Jiamin, ZHUANG Ming, WEN Liming   

  1. 1. School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2. Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, Sichuan 621010, China
    3. Sichuan Mianyang 404 Hospital, Mianyang, Sichuan 621053, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 针对食道卢戈染色内窥镜(Lugol’s chromoendoscopy, LCE)检查图像中病变类间相似度高和类内尺度变化大等问题,提出了一种以Sparse R-CNN作为基底网络并搭载多尺度细节增强金字塔网络(multi-scale detail enhancement pyramid network, MDEPN)结构的多类食道病灶检测方法。为了改善Sparse R-CNN中的特征金字塔网络(feature pyramid network, FPN)结构在对多尺度特征进行融合时存在的信息丢失和语义差异问题,MDEPN结构首先使用Gabor调制卷积模块对不同尺度特征进行增强,利用Gabor对方向和尺度的强关注度提高了特征图中纹理信息的表达能力;其次使用方向通道池化模块提取不同尺度特征的局部特征方向相似性和局部与全局特征的相关性,减轻了不同尺度特征融合时的语义差异。在自建的多类食道LCE病灶数据集上进行测试,结果表明该方法的mAP0.50检测精度达到了65.0%,相同条件下,比基准模型Sparse R-CNN提升了2.4个百分点,并超过了对比的其余主流检测方法。所设计的MDEPN模块可作为一种独立的结构融入其他检测模型中以提升性能,具有一定的通用性。

关键词: 食道病变检测, 金字塔网络, Gabor调制卷积, 方向通道池化

Abstract: Aiming at problems such as high interclass similarity and large intraclass scale changes in Lugol??s chromoendoscopy (LCE) images, this paper proposes a method for the detection of multiple esophageal diseases, which is based on Sparse R-CNN and equipped with multi-scale detail enhancement pyramid network (MDEPN) structure. In order to solve the problems of information loss and semantic difference in feature pyramid network (FPN) structure of Sparse R-CNN, the MDEPN structure firstly uses Gabor modulated convolution module to enhance the features of different scales, and uses Gabor??s strong attention to direction and scale to improve the expression ability of texture information in the feature map. Secondly, the directional channel pooling module is used to extract the local directional similarity and the correlation between local and global features of different scale features, so as to reduce the semantic differences in the fusion of different scale features. After testing on a self-built dataset of multiple esophageal LCE lesions, the accuracy of mAP0.50 is 65.0%, 2.4 percentage points  higher than that of the benchmark model Sparse R-CNN, and higher than other major detection methods. In addition, the designed MDEPN module can be integrated into other detection models as an independent structure to improve performance, and has certain versatility.

Key words: detection of esophageal lesions, pyramid network, Gabor modulated convolution, directional channel pooling