Multi-Scale Liver Tumor Segmentation Algorithm by Fusing Convolution and Transformer

doi:10.3778/j.issn.1002-8331.2210-0084

Abstract

Abstract: Accurate automatic segmentation methods for liver and liver tumors are important in helping physicians to diagnose, treat, and observe liver cancer in the postoperative period. Due to the intrinsic locality of convolution, existing convolution-based methods are difficult to establish long-range dependencies. Transformer??s cascading attention mechanism can establish global information association but will destroy local details. Based on this, a feature modeling method that fuses convolution and Transformer is proposed. The method interactively fuses local and global representations by mixed embedding to maximize the global dependencies at different resolutions. Meanwhile, the contextual information from different encoding stages is captured by multi-level feature fusion module at the skip connection to obtain richer semantic information. Finally, in order to cope with the variation of liver tumors in size and shape, a deformable multi-scale module is used to extract multi-scale features of tumors. The experiments mainly use Dice similarity coefficient (DSC) as evaluation metrics. The DSCs of liver and tumor on the LiTS17 dataset are 0.920 and 0.748, respectively, and the results show that the proposed network has more accurate liver tumor segmentation results compared to the baseline.

Key words: medical image, tumor segmentation, Transformer, convolutional neural network, multi-scale, feature fusion

摘要： 精确的肝脏和肝脏肿瘤自动分割方法对帮助医生进行肝癌诊断、治疗和术后观察具有重要的意义。由于卷积的局部性，现有基于卷积的方法难以建立长距离的依赖关系。Transformer的级联注意力机制可以建立全局的信息关联，但是会破坏局部细节。基于此，提出了一种融合卷积和Transformer的特征建模方法。该方法通过混合嵌入的方式交互融合局部表示和全局表示，最大程度地建立不同分辨率下的全局依赖关系。在跳跃连接处通过多级特征融合模块捕捉来自不同编码阶段的上下文信息以获取更丰富的语义信息。为了应对肝脏肿瘤在大小和形状上的变化，使用可变形多尺度模块提取肿瘤的多尺度特征。实验主要采用Dice相关性系数（Dice similarity coefficient，DSC）作为评价指标，在LiTS17数据集上肝脏和肿瘤的DSC分别为0.920和0.748，结果表明提出的网络相比基线具有更准确的肝脏肿瘤分割结果。

关键词: 医学图像, 肿瘤分割, Transformer, 卷积神经网络, 多尺度, 特征融合

CHEN Lifang, LUO Shiyong. Multi-Scale Liver Tumor Segmentation Algorithm by Fusing Convolution and Transformer[J]. Computer Engineering and Applications, 2024, 60(4): 270-279.

陈丽芳, 罗世勇. 融合卷积和Transformer的多尺度肝肿瘤分割方法[J]. 计算机工程与应用, 2024, 60(4): 270-279.

References

[1] SOLER L, DELINGETTE H, MALANDAIN G, et al. Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery[J]. Computer Aided Surgery, 2001, 6(3): 131-142.
[2] ABD-ELAZIZ O F, SAYED M S, ABDULLAH M I. Liver tumors segmentation from abdominal CT images using region growing and morphological processing[C]//Proceedings of the 2014 International Conference on Engineering and Technology, Cairo, Apr 19-20, 2014: 1-6.
[3] KUO C L, CHENG S C, LIN C L, et al. Texture-based treatment prediction by automatic liver tumor segmentation on computed tomography[C]//Proceedings of the 2017 International Conference on Computer, Information and Telecommunication Systems, Dalian, Jul 21-23, 2017: 128-132.
[4] CONZE P H, NOBLET V, ROUSSEAU F, et al. Scale-adaptive supervoxel-based random forests for liver tumor segmentation in dynamic contrast-enhanced CT scans[J]. International Journal of Computer Assisted Radiology and Surgery, 2017, 12(2): 223-233.
[5] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[6] ZHOU Z, RAHMAN SIDDIQUEE M M, TAJBAKHSH N, et al. UNet++: a nested U-net architecture for medical image segmentation[M]//Deep learning in medical image analysis and multimodal learning for clinical decision support. Cham: Springer, 2018: 3-11.
[7] XIAO X, LIAN S, LUO Z, et al. Weighted Res-UNet for high-quality retina vessel segmentation[C]//Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education, Hangzhou, Oct 19-21, 2018: 327-331.
[8] 张文秀, 朱振才, 张永合, 等. 基于残差块和注意力机制的细胞图像分割方法[J]. 光学学报, 2020, 40(17): 1710001.
ZHANG W X, ZHU Z C, ZHANG Y H, et al. Cell image segmentation method based on residual block and attention mechanism[J]. Acta Optica Sinica, 2020, 40(17): 1710001.
[9] CHRIST P F, ELSHAER M E A, ETTLINGER F, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields[C]//Proceedings of the 19th International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Oct 17-21, 2016. Cham: Springer, 2016: 415-423.
[10] CHEN X, ZHANG R, YAN P. Feature fusion encoder decoder network for automatic liver lesion segmentation[C]//Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging, Venice, Apr 8-11, 2019: 430-433.
[11] FAN T, WANG, LI Y, et al. MA-Net: a multi-scale attention network for liver and tumor segmentation[J]. IEEE Access, 2020, 8: 179656-179665.
[12] 刘一鸣, 肖志勇. 基于特征融合的肝脏肿瘤自动分割方法[J]. 激光与光电子学进展, 2021, 58(14): 458-466.
LIU Y M, XIAO Z Y. Automatic segmentation of liver tumors based on feature fusion[J]. Laser & Optoelectronics Progress, 2021, 58(14): 458-466.
[13] HONG L, WANG R, LEI T, et al. Qau-Net: quartet attention U-net for liver and liver-tumor segmentation[C]//Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, Shenzhen, Jul 5-9, 2021: 1-6.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 6000-6010.
[15] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 213-229.
[16] LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021: 10012-10022.
[17] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018: 7794-7803.
[18] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010. 1929, 2020.
[19] CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation[J]. arXiv:2102.04306, 2021.
[20] VALANARASU J M J, OZA P, HACIHALILOGLU I, et al. Medical transformer: gated axial-attention for medical image segmentation[C]//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, Sep 27-Oct 1, 2021. Cham: Springer, 2021: 36-46.
[21] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[22] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 22-25, 2017: 2881-2890.
[23] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 801-818.
[24] 高飞, 闫镔, 陈健, 等. 基于堆叠树形聚合结构空洞卷积的肝脏肿瘤分割[J]. 光学学报, 2021, 41(18): 73-84.
GAO F, YAN B, CHEN J, et al. Segmentation of liver tumors based on cavity convolution of stacked tree polymeric structures[J]. Acta Optica Sinica, 2021, 41(18): 73-84.
[25] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017: 764-773.
[26] PENG Z, HUANG W, GU S, et al. Conformer: local features coupling global representations for visual recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021: 367-376.
[27] GAO Y, ZHOU M, METAXAS D N. UTNet: a hybrid transformer architecture for medical image segmentation[C]//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, Sep 27-Oct 1, 2021. Cham: Springer, 2021: 61-71.