计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (10): 231-239.DOI: 10.3778/j.issn.1002-8331.2010-0297

• 图形图像处理 • 上一篇    下一篇

注意力机制海洋场景图像理解算法

邬满,文莉莉,孙苗   

  1. 1.广西壮族自治区海洋研究院 信息科,南宁 530022
    2.自然资源部海洋信息技术创新中心,天津 300171
    3.广西大学 电气工程学院,南宁 530007
    4.广西壮族自治区药用植物园 信息产业办,南宁 530023
  • 出版日期:2022-05-15 发布日期:2022-05-15

Attention Mechanism Image Understanding Algorithm of Ocean Scene

WU Man, WEN Lili, SUN Miao   

  1. 1.Information Department, Guangxi Academy of Oceanography, Nanning 530022, China
    2.Technology Innovation Center of Marine Information, Ministry of Natural Resources, Tianjin 300171, China
    3.School of Electrical Engineering, Guangxi University, Nanning 530007, China
    4.Information Industry Office, Guangxi Botanical Garden of Medicinal Plants, Nanning 530023, China
  • Online:2022-05-15 Published:2022-05-15

摘要: 针对复杂海洋场景(目标多尺度、对象多样化、风格差异大、时空强关联且存在不确定性目标)特点,研究基于注意力机制的复杂图像有效特征提取方法,提出一种基于卷积神经网络(convolutional neural network,CNN)和长短时记忆网络(long short-term memory,LSTM)相结合的复杂海洋场景图像中文描述生成模型;结合Jieba分词工具,实现了对复杂海洋场景监测图像的自动翻译。利用91卫图助手及无人机高清影像数据,建立模型并对算法进行验证。结果表明,Inception-v4比VGG16模型有更强的复杂特征提取能力,在相同数据集下,Inception-v4模型的图像分类能力高出约5.3个百分点;基于卷积神经网络和长短时记忆模型的图像中文描述生成算法基本可行,可以解决批量图像的自动标注问题,但在算法的稳定性和描述的准确性上需进一步提高。

关键词: 图像特征提取, 注意力机制, 长短时记忆模型, 图像描述生成, 中文分词

Abstract: Aiming at the characteristics of complex ocean scene (multi-scale target, diverse object, great style difference, strong spatiotemporal correlation and uncertain target), this paper studies the effective feature extraction method of complex image based on attention mechanism, and proposes a Chinese description generation model of complex ocean scene image based on convolutional neural network(CNN) and long short-term memory(LSTM) network. Combined with Jieba word segmentation tool, the complex ocean scene image is realized automatic translation of ocean scene monitoring images. Using 91 satellite map assistant and UAV high-definition image data, the model is established and the algorithm is verified. The results show that the Inception-v4 model has stronger complex feature extraction ability than VGG16 model, and the image classification ability of Inception-v4 model is about 5.3 percentage points higher than that of VGG16 model. Based on convolutional neural network and long short-term memory model, the image classification ability is basically feasible and can solve the problem of automatic annotation of batch images, but the stability and accuracy of the algorithm need to be further improved.

Key words: image feature extraction, attention mechanism, long short-term memory model, image description generation, Chinese word segmentation