Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (6): 133-139.DOI: 10.3778/j.issn.1002-8331.1811-0306

Previous Articles     Next Articles

Multi-Head Attention and Semantic Video Captioning

SHI Kai, HU Yan   

  1. School of Computer, Wuhan University of Technology, Wuhan 430070, China
  • Online:2020-03-15 Published:2020-03-13



  1. 武汉理工大学 计算机学院,武汉 430070


In the sequence-to-sequence video captioning model, the video information is greatly compressed after being encoded, resulting in the decoder side cannot fully utilized the video information. To solve this problem, a multi-head attention mechanism and semantic information are introduced into the model. The multi-head attention allows the model to focus different parts of the video information when generate different words. The semantic information is introduced by the semantic detection unit through the multi-label classification approach to generate the semantic probability information of the video, which provides additional guidance to the decoding end. The modified model is still training in end-to-end. The experimental results show that the modified model captioning effect has been significantly improved, and the modified method has a significant effect on improving the captioning ability.

Key words: video captioning, multi-head attention, semantic information



关键词: 视频标注, 多头注意力, 语义信息