计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 270-279.DOI: 10.3778/j.issn.1002-8331.2210-0084

• 图形图像处理 • 上一篇    下一篇

融合卷积和Transformer的多尺度肝肿瘤分割方法

陈丽芳,罗世勇   

  1. 江南大学  人工智能与计算机学院,江苏  无锡  214122
  • 出版日期:2024-02-15 发布日期:2024-02-15

Multi-Scale Liver Tumor Segmentation Algorithm by Fusing Convolution and Transformer

CHEN Lifang, LUO Shiyong   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 精确的肝脏和肝脏肿瘤自动分割方法对帮助医生进行肝癌诊断、治疗和术后观察具有重要的意义。由于卷积的局部性,现有基于卷积的方法难以建立长距离的依赖关系。Transformer的级联注意力机制可以建立全局的信息关联,但是会破坏局部细节。基于此,提出了一种融合卷积和Transformer的特征建模方法。该方法通过混合嵌入的方式交互融合局部表示和全局表示,最大程度地建立不同分辨率下的全局依赖关系。在跳跃连接处通过多级特征融合模块捕捉来自不同编码阶段的上下文信息以获取更丰富的语义信息。为了应对肝脏肿瘤在大小和形状上的变化,使用可变形多尺度模块提取肿瘤的多尺度特征。实验主要采用Dice相关性系数(Dice similarity coefficient,DSC)作为评价指标,在LiTS17数据集上肝脏和肿瘤的DSC分别为0.920和0.748,结果表明提出的网络相比基线具有更准确的肝脏肿瘤分割结果。

关键词: 医学图像, 肿瘤分割, Transformer, 卷积神经网络, 多尺度, 特征融合

Abstract: Accurate automatic segmentation methods for liver and liver tumors are important in helping physicians to diagnose, treat, and observe liver cancer in the postoperative period. Due to the intrinsic locality of convolution, existing convolution-based methods are difficult to establish long-range dependencies. Transformer??s cascading attention mechanism can establish global information association but will destroy local details. Based on this, a feature modeling method that fuses convolution and Transformer is proposed. The method interactively fuses local and global representations by mixed embedding to maximize the global dependencies at different resolutions. Meanwhile, the contextual information from different encoding stages is captured by multi-level feature fusion module at the skip connection to obtain richer semantic information. Finally, in order to cope with the variation of liver tumors in size and shape, a deformable multi-scale module is used to extract multi-scale features of tumors. The experiments mainly use Dice similarity coefficient (DSC) as evaluation metrics. The DSCs of liver and tumor on the LiTS17 dataset are 0.920 and 0.748, respectively, and the results show that the proposed network has more accurate liver tumor segmentation results compared to the baseline.

Key words: medical image, tumor segmentation, Transformer, convolutional neural network, multi-scale, feature fusion