Multi-Head Attention and Semantic Video Captioning

doi:10.3778/j.issn.1002-8331.1811-0306

Abstract

Abstract:

In the sequence-to-sequence video captioning model, the video information is greatly compressed after being encoded, resulting in the decoder side cannot fully utilized the video information. To solve this problem, a multi-head attention mechanism and semantic information are introduced into the model. The multi-head attention allows the model to focus different parts of the video information when generate different words. The semantic information is introduced by the semantic detection unit through the multi-label classification approach to generate the semantic probability information of the video, which provides additional guidance to the decoding end. The modified model is still training in end-to-end. The experimental results show that the modified model captioning effect has been significantly improved, and the modified method has a significant effect on improving the captioning ability.

Key words: video captioning, multi-head attention, semantic information

摘要：

在序列到序列的视频标注模型中，视频信息在经过编码之后被大幅压缩导致解码器端不能充分利用。为了解决这一问题，在模型中引入多头注意力机制和语义信息。多头注意力使得模型在生成不同的单词时可以焦距编码端视频信息的不同部分。语义信息由语义探测单元通过多标签分类方式生成视频的语义概率信息方式引入，给解码端提供额外指导，改进后的模型仍然是端到端的。实验结果表明，改进后的模型标注效果取得了显著的提升，采用的改进方法对提升标注能力有明显作用。

关键词: 视频标注, 多头注意力, 语义信息

SHI Kai, HU Yan. Multi-Head Attention and Semantic Video Captioning[J]. Computer Engineering and Applications, 2020, 56(6): 133-139.

石开，胡燕. 多头注意力与语义视频标注[J]. 计算机工程与应用, 2020, 56(6): 133-139.

[1]	ZHAI Yiming, WANG Binjun, ZHOU Zhining, TONG Xin. Multi-head Attention Pooling-Based RCNN Model for Text Classification [J]. Computer Engineering and Applications, 2021, 57(12): 155-160.
[2]	LIU Xinhui, CHEN Wenshi, ZHOU Ai, CHEN Fei, QU Wen, LU Mingyu. Multi-label Text Classification Based on Joint Model [J]. Computer Engineering and Applications, 2020, 56(14): 111-117.
[3]	ZHANG Chunxiang1，2, DENG Long3, GAO Xueyao3, LU Zhimao2. Chinese word sense disambiguation with semantic knowledge [J]. Computer Engineering and Applications, 2016, 52(3): 119-122.
[4]	ZHANG Jiaming, XI Yaoyi, WANG Bo, TANG Haohao, LI Tiancai. Method of micro-blog event tracking based on word vector [J]. Computer Engineering and Applications, 2016, 52(17): 73-78.
[5]	HAI Yinhua. Development of semantic information knowledge-base in Mongolian [J]. Computer Engineering and Applications, 2016, 52(10): 128-134.
[6]	WU Yaofeng1, WANG Wen2, LU Keqing2, WEI Yanding1, CHEN Zichen1. Semantics-based measurement information transmission method for reverse engineering [J]. Computer Engineering and Applications, 2015, 51(20): 140-144.
[7]	LI Jia, XU Qian, WANG Zi, CHEN Zhao. Forest products trading Web messages extraction algorithm based on semantic [J]. Computer Engineering and Applications, 2014, 50(19): 199-204.
[8]	ZHANG Hui, DING Bo, SUN Lijuan. Semantic information exchange upon heterogeneous CAD systems [J]. Computer Engineering and Applications, 2013, 49(11): 149-152.
[9]	DAI Chang-hua,ZHANG Chong,TANG Jiu-yang,XIAO Wei-dong. Method of organizing geographic multidimensional semantic information based on ontology [J]. Computer Engineering and Applications, 2008, 44(29): 153-156.
[10]	Liang Gai ZhiYong Feng. E-commerce Recommendation System of Integrated Semantic Information [J]. Computer Engineering and Applications, 2007, 43(11): 197-200.

Multi-Head Attention and Semantic Video Captioning

多头注意力与语义视频标注

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 10

Recommended Articles

Metrics