Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (9): 9-22.DOI: 10.3778/j.issn.1002-8331.2012-0539

Previous Articles     Next Articles

Review of Deep Neural Network-Based Image Caption

XU Hao, ZHANG Kai, TIAN Yingjie, CHONG Faguang, WANG Zichao   

  1. 1.College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201300, China
    2.Shanghai Electrical Research Institute, State Grid Corporation of China, Shanghai 200437, China
  • Online:2021-05-01 Published:2021-04-29



  1. 1.上海电力大学 计算机科学与技术学院,上海 201300
    2.国家电网公司 上海电器科学研究院,上海 200437


With the rapid development of deep learning, the quality of image caption is significantly improved. This paper mainly reviews the methods of image caption based on deep neural network and its research status in detail. Image caption algorithm combines the knowledge of computer vision and natural language processing togenerate natural language descriptions based on the content detected in the image automatically, which is an important part of scene understanding. Generally, the basic architecture of image caption task is composed of encoder and decoder. Improving encoders or decoders, applying methods of Generative Adversarial Networks(GAN). Reinforcement Learning(RL), Unsupervised Learning(UL) and Graph Convolution Neural Network(GCN) can effectively improve the performance of image caption algorithm. Afterward, the effect, advantages and disadvantages of each representative model algorithm are analyzed. Moreover, public datasets are introduced. On this basis, the comparative experiments are carried out. Finally, the challenges of image caption and possibility of future work are prospected.

Key words: deep neural network, computer vision, image caption, encoder-decoder architecture, attention mechanism



关键词: 深度神经网络, 计算机视觉, 图像描述, 编码器-解码器架构, 注意力机制