基于卷积神经网络的内镜图像中食管病变分类

doi:10.3778/j.issn.1002-8331.2111-0133

摘要/Abstract

摘要： 消化内镜检查是食管癌筛查的常规手段。由于内镜下的病灶在形状、颜色和质地上的个体差异和视觉相似性，食管鳞癌的诊断效率和准确率都极大地依赖于内镜医师的经验，尤其在白光内镜下容易被误诊和漏诊。针对上述问题，提出一种融合双线性池化和注意力机制的卷积神经网络，可基于白光内镜图像对食管病变进行分类。该网络以ResNet50作为基本框架，加入全新设计的全局通道注意力模块，重新标定通道间特征，并引入双线性池化操作融合多个特征层，增强特征表达。基于2?101例多中心临床患者的白光内镜图像数据集的实验结果显示，该方法对食管病变的分类准确率在图像和病人级别分别为94.2%和96.9%，对食管鳞癌的敏感度和特异度在图像级别为95.4%和98.8%，在病人级别为98.7%和95.9%，均优于实验中所对比的近年来其他模型和方法。该实验结果表明，提出的网络对白光内镜下的食管病变表现出优异的分类性能，可有效提高食管鳞癌的诊断准确率，同时具有较好的鲁棒性。

关键词: 食管鳞癌, 白光内镜图像, 卷积神经网络, 双线性池化, 注意力机制

Abstract: Gastrointestinal endoscopy is the major technique used for the screening of esophageal cancer. Due to individual variations and visual similarities of lesions in shapes, colors and textures under the endoscopy, the efficiency and accuracy of diagnosing esophageal squamous cell carcinoma is significantly dependent on the experience and proficiency of gastroenterologists. Lesions are often misdiagnosed especially under the white light endoscopy. To address these problems, a CNN architecture that integrates the bilinear pooling operation and attention mechanism is proposed to classify esophageal lesions in white light endoscopic images. The ResNet50 network is chosen as the backbone network structure. The newly designed global channel attention module is adopted to recalibrate features between different channels. The bilinear pooling operation is applied to merge features of different layers for improving the representation quality. Experiments are conducted on white light endoscopic images of 2,101 clinical cases collected from multiple hospitals. In this experimental results, the proposed model achieves an accuracy of 94.2%, a sensitivity of 95.4% and a specificity of 98.8% at the image level, while at the patient level, the accuracy is 96.9%, the sensitivity is 98.7% and the specificity is 95.9%. The comprehensive evaluation shows that the proposed model specifically has advantages in classifying esophageal lesions in endoscopic images and outperforms other state-of-the-art methods. It can effectively improve the accuracy of diagnosing esophageal squamous cell carcinoma, with a high robustness.

Key words: esophageal squamous cell carcinoma, white light endoscopic images, convolutional neural network, bilinear pooling, attention mechanism

龙其刚, 王金铭, 梁燕, 宋杰, 冯亚东, 李鹏, 赵凌霄. 基于卷积神经网络的内镜图像中食管病变分类[J]. 计算机工程与应用, 2023, 59(7): 118-125.

LONG Qigang, WANG Jinming, LIANG Yan, SONG Jie, FENG Yadong, LI Peng, ZHAO Lingxiao. Classification of Esophageal Lesions in Endoscopic Images Using Convolutional Neural Network[J]. Computer Engineering and Applications, 2023, 59(7): 118-125.

参考文献

[1] FREDDIE B，JACQUES F.Global cancer statistics 2018：GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J].CA：a Cancer Journal for Clinicians，2018，68（6）：394-424.
[2] HUANG F L，YU S J.Esophageal cancer：risk factors，genetic association，and treatment[J].Asian Journal of Surgery，2018，41（3）：210-215.
[3] 国家消化内镜专业质控中心，国家消化系疾病临床医学研究中心（上海），国家消化道早癌防治中心联盟，等.中国早期食管癌及癌前病变筛查专家共识意见（2019年，新乡）[J].中华消化内镜杂志，2019，36（11）：793-801.
National Digestive Endoscopy Improvement System，National Clinical Research Center for Digestive Diseases（Shanghai），National Early Gastrointestinal-Cancer Prevention & Treatment Center Alliance，et al.Expert consensus on early esophageal cancer and precancerous lesion screening in China（2019，Xinxiang）[J].Chinese Journal of Digestive Endoscopy，2019，36（11）：793-801.
[4] SOMMEN F V D，ZINGER S，CURVERS W L，et al.Computer-aided detection of early neoplastic lesions in Barrett’s esophagus[J].Endoscopy，2016，48（7）：617-624.
[5] LIU D Y，GAN T，RAO N N，et al.Identification of lesion images from gastrointestinal endoscope based on feature extraction of combinational methods with and without learning process[J].Medical Image Analysis，2016，32：281-294.
[6] GROOF J D，SOMMEN F V D，PUTTEN J V D，et al.The Argos project：the development of a computer-aided detection system to improve detection of Barrett’s neoplasia onwhite light endoscopy[J].United European Gastroenterology Journal，2019，7（4）：538-547.
[7] EBIGBO A，MENDEL R，PROBST A，et al.Computer-aided diagnosis using deep learning in the evaluation of early oesophageal adenocarcinoma[J].GUT，2019，68（7）：1143-1145.
[8] OHMORI M，ISHIHARA R，AOYAMA K，et al.Endoscopic detection and differentiation of esophageal lesions using a deep neural network[J].Gastrointestinal Endocopy，2020，91（2）：301-309.
[9] HASHIMOTO R，REQUA J，DAO T，et al.Artificial intel-ligence using convolutional neural networks for real-time detection of early esophageal neoplasia in Barrett’s esophagus（with video）[J].Gastrointestinal Endoscopy，2020，91（6）：1264-1271.
[10] DU W J，RAO N N，DONG C L，et al.Automatic classifi-cation of esophageal disease in gastroscopic images using an efficient channel attention deep dense convolutional neural network[J].Biomedical Optics Express，2021，12（6）：3066-3081.
[11] WU Z，GE R J，WEN M L，et al.ELNet：automatic classification and segmentation for esophageal lesions using convolutional neural network[J].Medical Image Analysis，2021，67：101838.
[12] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the International Conference on Learning Representations，May 7-9，2015.
[13] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2818-2826.
[14] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[15] XIE S，GIRSHICK R，DOLLAR P，et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：5987-5995.
[16] HUANG G，LIU Z，LAURENS V，et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2261-2269.
[17] HU J，SHEN L，ALBANIE S，et al.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018.
[18] WANG X，GIRSHICK R，GUPTAA，et al.Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018.
[19] LIN T Y，ROYCHOWDHURY A，MAJI S.Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1449-1457.
[20] GAO Y，BEIJBOM O，ZHANG N，et al.Compact bilinear pooling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：317-326.
[21] KIM J H，ON K W，LIM W，et al.Hadamard product for low-rank bilinear pooling[J].arXiv：1610.04325，2016.
[22] LI Y，WANG N，LIU J，et al.Factorized bilinear models forimage recognition[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2098-2106.
[23] YU C J，ZHAO X Y，ZHENG Q，et al.Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the European Conference on Computer Vision，2018：574-589.
[24] RENDLE S.Factorization machines[C]//Proceedings of the 10th IEEE International Conference on Data Mining，Sydney，Australia，December 14-17，2010：995-1000.
[25] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for denseobject detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[26] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision，2018：3-19.
[27] WANG Q L，WU B G，ZHU P H，et al.ECA-Net：efficient channelattention for deep convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：11534-11542.
[28] QIN Z Q，ZHANG P Y，WU F，et al.FCANet：frequency channel attention networks[J].arXiv：2012.11879，2020.
[29] KUMAGAI Y，TAKUBO K，KAWADA K，et al.Diagnosis using deep-learning artificial intelligence based on the endocytoscopic observation of the esophagus[J].Esophagus，2019，16（2）：180-187.
[30] LIU G S，HUA J，WU Z，et al.Automatic classification of esophageallesions in endoscopic images using a convolutional neural network[J].Annals of Translational Medicine，2020，8（7）：486-486.
[31] SELVARAJU R R，COGSWELL M，DAS A，et al.Grad-cam：visual explanations from deep networks via gradient-basedlocalization[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：618-626.