深度神经网络图像描述综述

doi:10.3778/j.issn.1002-8331.2012-0539

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (9): 9-22.DOI: 10.3778/j.issn.1002-8331.2012-0539

深度神经网络图像描述综述

许昊，张凯，田英杰，种法广，王子超

1.上海电力大学计算机科学与技术学院，上海 201300
2.国家电网公司上海电器科学研究院，上海 200437

出版日期:2021-05-01 发布日期:2021-04-29

Review of Deep Neural Network-Based Image Caption

XU Hao, ZHANG Kai, TIAN Yingjie, CHONG Faguang, WANG Zichao

1.College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201300, China
2.Shanghai Electrical Research Institute, State Grid Corporation of China, Shanghai 200437, China

Online:2021-05-01 Published:2021-04-29

摘要/Abstract

摘要：

深度学习的迅速发展使得图像描述效果得到显著提升，针对基于深度神经网络的图像描述方法及其研究现状进行详细综述。图像描述算法结合计算机视觉和自然语言处理的知识，根据图像中检测到的内容自动生成自然语言描述，是场景理解的重要部分。图像描述任务中，一般采用由编码器和解码器组成的基本架构。改进编码器或解码器，应用生成对抗网络、强化学习、无监督学习以及图卷积神经网络等方法能有效提高图像描述算法的性能。对每类方法的代表模型算法的效果以及优缺点进行分析，并介绍适用的公开数据集，在此基础上进行对比实验。对图像描述面临的挑战以及未来工作的发展方向做出展望。

关键词: 深度神经网络, 计算机视觉, 图像描述, 编码器-解码器架构, 注意力机制

Abstract:

With the rapid development of deep learning, the quality of image caption is significantly improved. This paper mainly reviews the methods of image caption based on deep neural network and its research status in detail. Image caption algorithm combines the knowledge of computer vision and natural language processing togenerate natural language descriptions based on the content detected in the image automatically, which is an important part of scene understanding. Generally, the basic architecture of image caption task is composed of encoder and decoder. Improving encoders or decoders, applying methods of Generative Adversarial Networks（GAN）. Reinforcement Learning（RL）, Unsupervised Learning（UL） and Graph Convolution Neural Network（GCN） can effectively improve the performance of image caption algorithm. Afterward, the effect, advantages and disadvantages of each representative model algorithm are analyzed. Moreover, public datasets are introduced. On this basis, the comparative experiments are carried out. Finally, the challenges of image caption and possibility of future work are prospected.

Key words: deep neural network, computer vision, image caption, encoder-decoder architecture, attention mechanism

许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.

XU Hao, ZHANG Kai, TIAN Yingjie, CHONG Faguang, WANG Zichao. Review of Deep Neural Network-Based Image Caption[J]. Computer Engineering and Applications, 2021, 57(9): 9-22.

[1]	王林，柴江云. 深度神经网络在多场景车辆属性识别中的研究[J]. 计算机工程与应用, 2021, 57(9): 162-167.
[2]	张朕通，单玉刚，袁杰. 联合多尺度和注意力机制的遥感影像检测[J]. 计算机工程与应用, 2021, 57(9): 212-216.
[3]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[4]	李明山，韩清鹏，张天宇，王道累. 改进SSD的安全帽检测方法[J]. 计算机工程与应用, 2021, 57(8): 192-197.
[5]	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[6]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[7]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[8]	王玲，王家沛，王鹏，孙爽滋. 融合注意力机制的孪生网络目标跟踪算法研究[J]. 计算机工程与应用, 2021, 57(8): 169-174.
[9]	杨波，陶青川，董沛君. 改进Deeplab v3+网络的手术器械分割方法[J]. 计算机工程与应用, 2021, 57(7): 222-227.
[10]	祝钧桃，姚光乐，张葛祥，李军，杨强，王胜，叶绍泽. 深度神经网络的小样本学习综述[J]. 计算机工程与应用, 2021, 57(7): 22-33.
[11]	韦佶宏，郑荣锋，刘嘉勇. 基于混合神经网络的恶意TLS流量识别研究[J]. 计算机工程与应用, 2021, 57(7): 107-114.
[12]	陈伟，徐云. 基于文献挖掘的生物实体关系提取研究[J]. 计算机工程与应用, 2021, 57(7): 115-120.
[13]	肖雨晴，杨慧敏. 目标检测算法在交通场景中应用综述[J]. 计算机工程与应用, 2021, 57(6): 30-41.
[14]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[15]	张睿，吴伯雄，张丽园，张博. 复杂场景下行人轨迹预测方法[J]. 计算机工程与应用, 2021, 57(6): 138-143.

深度神经网络图像描述综述

Review of Deep Neural Network-Based Image Caption

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics