Attention Mechanism Image Understanding Algorithm of Ocean Scene

doi:10.3778/j.issn.1002-8331.2010-0297

Abstract

Abstract: Aiming at the characteristics of complex ocean scene （multi-scale target, diverse object, great style difference, strong spatiotemporal correlation and uncertain target）, this paper studies the effective feature extraction method of complex image based on attention mechanism, and proposes a Chinese description generation model of complex ocean scene image based on convolutional neural network（CNN） and long short-term memory（LSTM） network. Combined with Jieba word segmentation tool, the complex ocean scene image is realized automatic translation of ocean scene monitoring images. Using 91 satellite map assistant and UAV high-definition image data, the model is established and the algorithm is verified. The results show that the Inception-v4 model has stronger complex feature extraction ability than VGG16 model, and the image classification ability of Inception-v4 model is about 5.3 percentage points higher than that of VGG16 model. Based on convolutional neural network and long short-term memory model, the image classification ability is basically feasible and can solve the problem of automatic annotation of batch images, but the stability and accuracy of the algorithm need to be further improved.

Key words: image feature extraction, attention mechanism, long short-term memory model, image description generation, Chinese word segmentation

摘要： 针对复杂海洋场景（目标多尺度、对象多样化、风格差异大、时空强关联且存在不确定性目标）特点，研究基于注意力机制的复杂图像有效特征提取方法，提出一种基于卷积神经网络（convolutional neural network，CNN）和长短时记忆网络（long short-term memory，LSTM）相结合的复杂海洋场景图像中文描述生成模型；结合Jieba分词工具，实现了对复杂海洋场景监测图像的自动翻译。利用91卫图助手及无人机高清影像数据，建立模型并对算法进行验证。结果表明，Inception-v4比VGG16模型有更强的复杂特征提取能力，在相同数据集下，Inception-v4模型的图像分类能力高出约5.3个百分点；基于卷积神经网络和长短时记忆模型的图像中文描述生成算法基本可行，可以解决批量图像的自动标注问题，但在算法的稳定性和描述的准确性上需进一步提高。

关键词: 图像特征提取, 注意力机制, 长短时记忆模型, 图像描述生成, 中文分词

WU Man, WEN Lili, SUN Miao. Attention Mechanism Image Understanding Algorithm of Ocean Scene[J]. Computer Engineering and Applications, 2022, 58(10): 231-239.

邬满, 文莉莉, 孙苗. 注意力机制海洋场景图像理解算法[J]. 计算机工程与应用, 2022, 58(10): 231-239.

References

[1] LIU F，REN X，LIU Y，et al.simNet：stepwise image-topic merging network for generating detailed and comprehensive image captions[J].arXiv：1808.08732，2018.
[2] MAO Y，ZHOU C，WANG X，et al.Show and tell more：topic-oriented multi-sentence image captioning[C]//27th International Joint Conference on Artificial Intelligence，2018：4258-4264.
[3] SONG Y，SHI S，LI J，et al.Directional Skip-Gram：explicitly distinguishing left and right context for word embeddings[C]//2018 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies，2018，2：175-180.
[4] ANDERSON P，HE X，BUEHLER C，et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：6077-6086.
[5] 赖俊，饶瑞.深度强化学习在室内无人机目标搜索中的应用[J].计算机工程与应用，2020，56（17）：156-160.
LAI J，RAO R.Application of deep reinforcement learning in indoor UAV target search[J].Computer Engineering and Applications，2020，56（17）：156-160.
[6] 邬满，张万桢，孙苗，等.基于DBIRCH算法的Argo剖面数据聚类[J].吉林大学学报（信息科学版），2020，38（5）：568-577.
WU M，ZHANG W Z，SUN M，et al.Clustering Argo profile data based on DBIRCH algorithm[J].Journal of Jilin University（Information Science Edition），2020，38（5）：568-577.
[7] 刘有用，张江梅，王坤朋，等.不平衡数据集下的水下目标快速识别方法[J].计算机工程与应用，2020，56（17）：236-242.
LIU Y Y，ZHANG J M，WANG K P，et al.Rapid underwater target recognition method under unbalanced data sets[J].Computer Engineering and Applications，2020，56（17）：236-242.
[8] YANG C，KIM T，WANG R，et al.Show，attend and translate：unsupervised image translation with self-regularization and attention[J].arXiv：1806.06195，2018.
[9] LIU B，FU J，KATO M P，et al.Beyond narrative description：generating poetry from images by multi-adversarial training[J].arXiv：1804.08473，2018.
[10] 张万桢，刘同来，邬满，等.使用环形过滤器的K值自适应KNN算法[J].计算机工程与应用，2019，55（23）：45-52.
ZHANG W Z，LIU T L，WU M，et al.K-value adaptive KNN algorithm using annular filter[J].Computer Engineering and Applications，2019，55（23）：45-52.
[11] 周治平，张威.结合视觉属性注意力和残差连接的图像描述生成模型[J].计算机辅助设计与图形学学报，2018，30（8）：1536-1542.
ZHOU Z P，ZHANG W.An image description generation model combining visual attribute attention and residual connection[J].Journal of Computer Aided Design and Graphics，2018，30（8）：1536-1542.
[12] 朱欣鑫.基于深度学习的图像描述算法研究[D].北京：北京邮电大学，2019.
ZHU X X.Research on image description algorithm based on deep learning[D].Beijing：Beijing University of Posts and Telecommunications，2019.
[13] 常智.基于深度学习的图像描述方法研究[D].天津：天津理工大学，2019.
CHANG Z.Research on image description method based on deep learning[D].Tianjin：Tianjin University of Technology，2019.
[14] MEJJATI Y A，RICHARDT C，TOMPKIN J，et al.Unsupervised attention-guided image to image translation[C]//Advances in Neural Information Processing Systems，2018：210-230.
[15] 蓝玮毓，王晓旭，杨刚，等.标签增强的中文看图造句[J].计算机学报，2019，42（1）：136-148.
LAN W Y，WANG X X，YANG G，et al.Improving Chinese image captioning by tag prediction[J].Chinese Journal of Computers，2019，42（1）：136-148.
[16] CHI L，MU Y. Deep steering：learning end-to-end driving model from spatial and temporal visual cues[J].arXiv：1708.03798，2017.
[17] LIANG J W，JIANG L，CAO L，et al.Focal visual-text attention for Memex question answering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2019，41（8）：1893-1908.
[18] CHEN X，XU C，YANG X，et al.Attention-GAN for object transfiguration in wild images[C]//15th European Conference on Computer Vision，Munich，Germany，2018：167-184.
[19] 黄健，张钢.深度卷积神经网络的目标检测算法综述[J].计算机工程与应用，2020，56（17）：12-23.
HUANG J，ZHANG G.A survey of target detection algorithms based on deep convolution neural networks[J].Computer Engineering and Applications，2020，56（17）：12-23.
[20] ZHOU P，HAN X，MORARIU V I，et al.Learning rich features for image manipulation detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：1053-1061.
[21] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[22] TZENG E，HOFFMAN J，DARRELL T，et al.Simultaneous deep transfer across domains and tasks[C]//2015 IEEE International Conference on Computer Vision，2015：4068-4076.
[23] ANEJA J，DESHPANDE A，SCHWING A G.Convolutional image captioning[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition，2018.
[24] LUO R，PRICE B，COHEN S，et al.Discriminability objective for training descriptive captions[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition，2018.
[25] CHEN S，ZHAO Q.Boosted attention：leveraging human attention for image captioning[C]//15th European Conference on Computer Vision，2018.
[26] JIANG W，MA L，JIANG Y G，et al.Recurrent fusion network for image captioning[C]//15th European Conference on Computer Vision，2018.
[27] ZHANG H，GOODFELLOW I，METAXAS D，et al.Self-attention generative adversarial networks[J].arXiv：1805.08318，2018.
[28] LONG M，CAO Y，WANG J，et al.Learning transferable features with deep adaptation networks[J].arXiv：1502.
02791，2015.
[29] LI Y，WANG N，SHI J，et al.Adaptive batch normalization for practical domain adaptation[J].Pattern Recognition，2018，80：109-117.