Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (8): 41-55.DOI: 10.3778/j.issn.1002-8331.2206-0022

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Research on Application of Vision Transformer in Medical Image Analysis

SHI Lei, JI Qingyu, CHEN Qingwei, ZHAO Hengyi, ZHANG Junxing   

  1. 1.College of Computer Science, Inner Mongolia University, Hohhot 010021, China
    2.College of Computer Science and Technology, Baotou Medical College, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014040, China
    3.The Second Affiliated Hospital of Baotou Medical College, Baotou, Inner Mongolia 014040, China
  • Online:2023-04-15 Published:2023-04-15



  1. 1.内蒙古大学 计算机学院,呼和浩特 010021
    2.内蒙古科技大学包头医学院 计算机科学与技术学院,内蒙古 包头 014040
    3.包头医学院第二附属医院,内蒙古 包头 014040

Abstract: Deep self-attentive network(Transformer) has a natural ability to model global features and long-range correlations of input information, which is strongly complementary to the inductive bias property of convolutional neural networks(CNN). Inspired by its great success in natural language processing, Transformer has been widely introduced into various computer vision tasks, especially medical image analysis, and has achieved remarkable performance. In this paper, it first introduces the typical work of vision Transformer on natural images, and then organizes and summarizes the related work according to different lesions or organs in the subfields of medical image segmentation, medical image classification and medical image registration, focusing on the implementation ideas of some representative work. Finally, current researches are discussed and the future direction is pointed out. The purpose of this paper is to provide a reference for further in-depth research in this field.

Key words: vision Transformer, medical image segmentation, medical image classification, medical image registration

摘要: 深度自注意力网络(Transformer)对输入信息全局特征和长距离相关性具有天然良好的建模能力,其与卷积神经网络(CNN)的归纳偏置特性具有较强互补性。受其在自然语言处理领域取得巨大成功的启发,Transformer已被广泛引入到计算机视觉各项任务特别是医学图像分析领域并已取得了不俗表现。对Transformer与自然图像结合的典型工作进行介绍,根据视觉Transformer在医学图像分割、医学图像分类以及医学图像配准等子领域对相关工作按照不同病灶及部位进行了整理和归纳,重点对一些代表性研究工作的实现思想进行了详细分析。对现有研究工作进行了讨论并对未来方向进行了展望,以期为该领域的进一步深入研究提供参考。

关键词: 视觉Transformer, 医学图像分割, 医学图像分类, 医学图像配准