计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (7): 118-125.DOI: 10.3778/j.issn.1002-8331.2111-0133

• 模式识别与人工智能 • 上一篇    下一篇

基于卷积神经网络的内镜图像中食管病变分类

龙其刚,王金铭,梁燕,宋杰,冯亚东,李鹏,赵凌霄   

  1. 1.中国科学技术大学 生物医学工程学院(苏州) 生命科学与医学部,合肥 230026
    2.中国科学院 苏州生物医学工程技术研究所,江苏 苏州 215163
    3.东南大学附属中大医院 消化内科,南京 210009
  • 出版日期:2023-04-01 发布日期:2023-04-01

Classification of Esophageal Lesions in Endoscopic Images Using Convolutional Neural Network

LONG Qigang, WANG Jinming, LIANG Yan, SONG Jie, FENG Yadong, LI Peng, ZHAO Lingxiao   

  1. 1.Division of Life Sciences and Medicine, School of Biomedical Engineering(Suzhou), University of Science and Technology of China, Hefei 230026, China
    2.Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu 215163, China
    3.Department of Gastroenterology, Zhongda Hospital Affiliated to Southeast University, Nanjing 210009, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 消化内镜检查是食管癌筛查的常规手段。由于内镜下的病灶在形状、颜色和质地上的个体差异和视觉相似性,食管鳞癌的诊断效率和准确率都极大地依赖于内镜医师的经验,尤其在白光内镜下容易被误诊和漏诊。针对上述问题,提出一种融合双线性池化和注意力机制的卷积神经网络,可基于白光内镜图像对食管病变进行分类。该网络以ResNet50作为基本框架,加入全新设计的全局通道注意力模块,重新标定通道间特征,并引入双线性池化操作融合多个特征层,增强特征表达。基于2?101例多中心临床患者的白光内镜图像数据集的实验结果显示,该方法对食管病变的分类准确率在图像和病人级别分别为94.2%和96.9%,对食管鳞癌的敏感度和特异度在图像级别为95.4%和98.8%,在病人级别为98.7%和95.9%,均优于实验中所对比的近年来其他模型和方法。该实验结果表明,提出的网络对白光内镜下的食管病变表现出优异的分类性能,可有效提高食管鳞癌的诊断准确率,同时具有较好的鲁棒性。

关键词: 食管鳞癌, 白光内镜图像, 卷积神经网络, 双线性池化, 注意力机制

Abstract: Gastrointestinal endoscopy is the major technique used for the screening of esophageal cancer. Due to individual variations and visual similarities of lesions in shapes, colors and textures under the endoscopy, the efficiency and accuracy of diagnosing esophageal squamous cell carcinoma is significantly dependent on the experience and proficiency of gastroenterologists. Lesions are often misdiagnosed especially under the white light endoscopy. To address these problems, a CNN architecture that integrates the bilinear pooling operation and attention mechanism is proposed to classify esophageal lesions in white light endoscopic images. The ResNet50 network is chosen as the backbone network structure. The newly designed global channel attention module is adopted to recalibrate features between different channels. The bilinear pooling operation is applied to merge features of different layers for improving the representation quality. Experiments are conducted on white light endoscopic images of 2,101 clinical cases collected from multiple hospitals. In this experimental results, the proposed model achieves an accuracy of 94.2%, a sensitivity of 95.4% and a specificity of 98.8% at the image level, while at the patient level, the accuracy is 96.9%, the sensitivity is 98.7% and the specificity is 95.9%. The comprehensive evaluation shows that the proposed model specifically has advantages in classifying esophageal lesions in endoscopic images and outperforms other state-of-the-art methods. It can effectively improve the accuracy of diagnosing esophageal squamous cell carcinoma, with a high robustness.

Key words: esophageal squamous cell carcinoma, white light endoscopic images, convolutional neural network, bilinear pooling, attention mechanism