计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 244-253.DOI: 10.3778/j.issn.1002-8331.2306-0386

• 图形图像处理 • 上一篇    下一篇

SFA-ConvNeXt:逐级聚合多尺度ConvNeXt的皮肤镜图像分类

王泽彤,张俊华,王肖   

  1. 云南大学 信息学院,昆明 650500
  • 出版日期:2024-10-15 发布日期:2024-10-15

SFA-ConvNeXt:Dermoscopic Image Classification for Stepwise Aggregation of Multiscale ConvNeXt

WANG Zetong, ZHANG Junhua, WANG Xiao   

  1. School of Information Science and Engineering, Yunnan University, Kunming 650500, China
  • Online:2024-10-15 Published:2024-10-15

摘要: 皮肤癌的早期发现对患者的五年生存率有着显著的提高,然而由于早期恶性肿瘤在皮肤中的病变非常细微,其症状并不明显,专业医生需要进行多次活检并提取病变组织才可以诊断出病变类型。现有的机器学习方法由于难以同时关注空间细节信息与浅层语义特征,其在皮肤病变图像中识别准确率并不高。为了有效表示空间位置和浅层特征信息,避免模型过于关注细节信息导致易分图像误分类等问题,提出了一种基于ConvNeXt的逐级聚合注意力网络。该方法通过分层ConvNeXt编码器逐层提取病变区域的深层和浅层特征,并通过并行空间注意力有效整合空间位置信息和深层或浅层语义特征,聚合多尺度上下文信息。同时,设计逐级特征聚合模块有效整合深层和浅层特征,并通过动态调整权重的方式将深层和浅层特征聚合,高度符合专业医生对皮肤镜图像分类时粗略观察和细微观察的过程。在ISIC2018、ISIC2019数据集中上进行实验测试,其准确率、精确率、召回率和F1-Score分别是95.27%、93.76%、92.83%、93.18%与92.63%、91.06%、87.05%、88.81%。通过与ConvNeXt相比,准确率分别提升了2.13和3.29个百分点,证明其能有效地提取细节特征和粗略特征,为皮肤镜图像的诊断提供新的依据。

关键词: ConvNeXt, 并行空间注意力, 逐级特征聚合, 皮肤镜图像, 图像分类

Abstract: Early detection of skin cancer significantly improves the five-year survival rate for patients. However, due to the subtle nature of early malignant tumors in the skin, their symptoms are not apparent, and specialized doctors need to perform multiple biopsies and extract lesion tissues to diagnose the type of lesion. Traditional artificial intelligence methods have low accuracy in identifying skin lesion images primarily because they are difficult to simultaneously focus on spatial details and shallow semantic features. To effectively represent spatial positions and shallow feature information, and avoid the model being overly concerned with detailed information, which can lead to misclassification of easily distinguishable images, a progressive aggregation attention network based on ConvNeXt is proposed. This method utilizes a hierarchical ConvNeXt encoder to extract deep and shallow features of lesion regions layer by layer. By employing parallel spatial attention, it effectively integrates spatial position information with deep or shallow semantic features, aggregating multi-scale contextual information. Meanwhile, a progressive feature aggregation module is designed to effectively integrate deep and shallow features and aggregate them by dynamically adjusting weights, closely aligning with the process of rough observation and meticulous examination in skin image classification by specialized doctors. Experimental tests on the ISIC2018 and ISIC2019 datasets show that the accuracy, precision, recall, and F1-Score of this method are 95.27%, 93.76%, 92.83%, and 93.18%, respectively, for ISIC2018, and 92.63%, 91.06%, 87.05%, and 88.81%, respectively, for ISIC2019. Compared to ConvNeXt, the accuracy is improved by 2.13 and 3.29 percentage points respectively, demonstrating its ability to effectively extract detailed and rough features, providing new evidence for the diagnosis of skin image through dermatoscopy.

Key words: ConvNeXt, parallel spatial attention, stepwise feature aggregation, dermatoscopic images, image classification