Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (6): 133-139.DOI: 10.3778/j.issn.1002-8331.1811-0306
Previous Articles Next Articles
SHI Kai, HU Yan
Online:
Published:
石开,胡燕
Abstract:
In the sequence-to-sequence video captioning model, the video information is greatly compressed after being encoded, resulting in the decoder side cannot fully utilized the video information. To solve this problem, a multi-head attention mechanism and semantic information are introduced into the model. The multi-head attention allows the model to focus different parts of the video information when generate different words. The semantic information is introduced by the semantic detection unit through the multi-label classification approach to generate the semantic probability information of the video, which provides additional guidance to the decoding end. The modified model is still training in end-to-end. The experimental results show that the modified model captioning effect has been significantly improved, and the modified method has a significant effect on improving the captioning ability.
Key words: video captioning, multi-head attention, semantic information
摘要:
在序列到序列的视频标注模型中,视频信息在经过编码之后被大幅压缩导致解码器端不能充分利用。为了解决这一问题,在模型中引入多头注意力机制和语义信息。多头注意力使得模型在生成不同的单词时可以焦距编码端视频信息的不同部分。语义信息由语义探测单元通过多标签分类方式生成视频的语义概率信息方式引入,给解码端提供额外指导,改进后的模型仍然是端到端的。实验结果表明,改进后的模型标注效果取得了显著的提升,采用的改进方法对提升标注能力有明显作用。
关键词: 视频标注, 多头注意力, 语义信息
SHI Kai, HU Yan. Multi-Head Attention and Semantic Video Captioning[J]. Computer Engineering and Applications, 2020, 56(6): 133-139.
石开,胡燕. 多头注意力与语义视频标注[J]. 计算机工程与应用, 2020, 56(6): 133-139.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1811-0306
http://cea.ceaj.org/EN/Y2020/V56/I6/133