计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (6): 133-139.DOI: 10.3778/j.issn.1002-8331.1811-0306

• 模式识别与人工智能 • 上一篇    下一篇

多头注意力与语义视频标注

石开,胡燕   

  1. 武汉理工大学 计算机学院,武汉 430070
  • 出版日期:2020-03-15 发布日期:2020-03-13

Multi-Head Attention and Semantic Video Captioning

SHI Kai, HU Yan   

  1. School of Computer, Wuhan University of Technology, Wuhan 430070, China
  • Online:2020-03-15 Published:2020-03-13

摘要:

在序列到序列的视频标注模型中,视频信息在经过编码之后被大幅压缩导致解码器端不能充分利用。为了解决这一问题,在模型中引入多头注意力机制和语义信息。多头注意力使得模型在生成不同的单词时可以焦距编码端视频信息的不同部分。语义信息由语义探测单元通过多标签分类方式生成视频的语义概率信息方式引入,给解码端提供额外指导,改进后的模型仍然是端到端的。实验结果表明,改进后的模型标注效果取得了显著的提升,采用的改进方法对提升标注能力有明显作用。

关键词: 视频标注, 多头注意力, 语义信息

Abstract:

In the sequence-to-sequence video captioning model, the video information is greatly compressed after being encoded, resulting in the decoder side cannot fully utilized the video information. To solve this problem, a multi-head attention mechanism and semantic information are introduced into the model. The multi-head attention allows the model to focus different parts of the video information when generate different words. The semantic information is introduced by the semantic detection unit through the multi-label classification approach to generate the semantic probability information of the video, which provides additional guidance to the decoding end. The modified model is still training in end-to-end. The experimental results show that the modified model captioning effect has been significantly improved, and the modified method has a significant effect on improving the captioning ability.

Key words: video captioning, multi-head attention, semantic information