计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (1): 1-14.DOI: 10.3778/j.issn.1002-8331.2204-0207

• 热点与综述 • 上一篇    下一篇

Transformer在计算机视觉领域的研究综述

李翔,张涛,张哲,魏宏杨,钱育蓉   

  1. 新疆大学 软件学院,乌鲁木齐 830002
  • 出版日期:2023-01-01 发布日期:2023-01-01

Survey of Transformer Research in Computer Vision

LI Xiang, ZHANG Tao, ZHANG Zhe, WEI Hongyang, QIAN Yurong   

  1. College of Software, Xinjiang University, Urumqi 830002, China
  • Online:2023-01-01 Published:2023-01-01

摘要: Transformer是一种基于自注意力机制的深度神经网络。近几年,基于Transformer的模型已成为计算机视觉领域的热门研究方向,其结构也在不断改进和扩展,比如局部注意力机制、金字塔结构等。通过对基于Transformer结构改进的视觉模型,分别从性能优化和结构改进两个方面进行综述和总结;也对比分析了Transformer和CNN各自结构的优缺点,并介绍了一种新型的CNN+Transformer的混合结构;最后,对Transformer在计算机视觉上的发展进行总结和展望。

关键词: Transformer, 卷积神经网络(CNN), 混合结构, 计算机视觉, 深度学习

Abstract: Transformer is a deep neural network based on self-attention mechanism. In recent years, Transformer-based models have become a hot research direction in the field of computer vision, and their structures are constantly being improved and expanded, such as local attention mechanisms, pyramid structures, and so on. Through the improved vision model based on Transformer structure, the performance optimization and structure improvement are reviewed and summarized respectively. In addition,the advantages and disadvantages of the respective structures of the Transformer and convolutional neural network(CNN) are compared and analyzed,and a new hybrid structure of CNN+Transformer is introduced. Finally,the development of Transformer in computer vision is summarized and prospected.

Key words: Transformer, convolutional neural network(CNN), hybrid structure, computer vision, deep learning