Research on Video Summarization Technology Combining Local Reward Mechanism

doi:10.3778/j.issn.1002-8331.2003-0331

Abstract

Abstract:

Video summarization aims to shorten the length of the video while preserving the main content, eminently saving time of browsing videos. A key step of video summarization is to evaluate the performance of generated summaries, whereas most existing methods focus on evaluating it based on the whole video. However, evaluation based on the entire video sequence is computationally expensive, especially for long videos. Moreover, the evaluation of the generated summary on the entire video often ignores the inherent temporal relationship of the video data, which leads to the lack of logic of the storyline. It thereby proposes a novel framework for video summarization called Attentive Local Reward Summarization Network（ALRSN）. To be precise, the model performs frame-level important score predictions through a self-attention mechanism. To evaluate the performance of generated summaries, it further designs a local reward function that jointly accounts for both the local diversity and local representativeness. The generated summary maps to the original video and evaluates the performance in a local scope, therefore it has the temporal relationship. In addition, the local reward function encourages the model to produce a more diverse and representative summary in the local scope, thereby obtaining a higher reward. The comprehensive experiment on two benchmark datasets, SumMe and TvSum, shows that the ALRSN model is superior to the state-of-the-art methods.

Key words: computer vision, video summarization, attention mechanism, local reward function

摘要：

视频摘要技术的目的是在缩短视频长度的同时，概括视频的主要内容，这样可以极大地节省人们浏览视频的时间。视频摘要技术的一个关键步骤是评估生成摘要的性能，现有的大多数方法是基于整个视频进行评估。然而，基于整个视频序列进行评估的计算成本很高，特别是对于长视频。而且在整个视频上评估生成摘要往往忽略了视频数据固有的时序关系，导致生成摘要缺乏故事情节的逻辑性。因此，提出了一个关注局部信息的视频摘要网络，称为自注意力和局部奖励视频摘要网络（ALRSN）。确切地说，该模型采用自注意力机制预测视频帧的重要性分数，然后通过重要性分数生成视频摘要。为了评估生成摘要的性能，进一步设计了一个局部奖励函数，同时考虑了视频摘要的局部多样性和局部代表性。该函数将生成摘要映射回原视频，并在局部范围内评估摘要的性能，使其具有原视频的时序结构。通过在局部范围内获得更高的奖励分数，使模型生成更多样化、更具代表性的视频摘要。综合实验表明，在两个基准数据集SumMe和TvSum上，ALRSN模型优于现有方法。

关键词: 计算机视觉, 视频摘要, 注意力机制, 局部奖励函数

MEI Feng, ZHOU Juanping, LU Lu. Research on Video Summarization Technology Combining Local Reward Mechanism[J]. Computer Engineering and Applications, 2021, 57(11): 211-218.

梅锋，周娟平，陆璐. 结合局部奖励机制的视频摘要技术研究[J]. 计算机工程与应用, 2021, 57(11): 211-218.

[1]	ZHANG Zhentong, SHAN Yugang, YUAN Jie. Remote Sensing Image Detection Algorithm Combining Multi-scale and Attention Mechanism [J]. Computer Engineering and Applications, 2021, 57(9): 212-216.
[2]	XU Hao, ZHANG Kai, TIAN Yingjie, CHONG Faguang, WANG Zichao. Review of Deep Neural Network-Based Image Caption [J]. Computer Engineering and Applications, 2021, 57(9): 9-22.
[3]	RAN Rong, XU Xinghua, QIU Shaohua, CUI Xiaopeng, OUYANG Bin. Review of Crack Detection Methods Based on Deep Convolutional Neural Networks [J]. Computer Engineering and Applications, 2021, 57(9): 23-35.
[4]	LI Mingshan, HAN Qingpeng, ZHANG Tianyu, WANG Daolei. Safety Helmet Detection Method of Improved SSD [J]. Computer Engineering and Applications, 2021, 57(8): 192-197.
[5]	XU Degang, WANG Lu, LI Fan. Review of Typical Object Detection Algorithms for Deep Learning [J]. Computer Engineering and Applications, 2021, 57(8): 10-25.
[6]	ZHAO Yuanli, LIANG Zhijian. Research on Stance Detection Based on Dual Attention Mechanism of Heteronuclear Convolution [J]. Computer Engineering and Applications, 2021, 57(8): 119-125.
[7]	ZHANG Yue, HUANG Yourui, LIU Pengkun. Research on Multi-resolution Human Pose Estimation with Attention Mechanism [J]. Computer Engineering and Applications, 2021, 57(8): 126-132.
[8]	WANG Ling, WANG Jiapei, WANG Peng, SUN Shuangzi. Siamese Network Tracking Algorithms for Hierarchical Fusion of Attention Mechanism [J]. Computer Engineering and Applications, 2021, 57(8): 169-174.
[9]	YANG Bo, TAO Qingchuan, DONG Peijun. Surgical Instrument Segmentation Method Based on Improved Deeplab v3+ Network [J]. Computer Engineering and Applications, 2021, 57(7): 222-227.
[10]	CHEN Wei, XU Yun. Research on Extraction of Biomedical Entity Relation Based on Literature Mining [J]. Computer Engineering and Applications, 2021, 57(7): 115-120.
[11]	XIAO Yuqing, YANG Huimin. Research on Application of Object Detection Algorithm in Traffic Scene [J]. Computer Engineering and Applications, 2021, 57(6): 30-41.
[12]	HUANG Jinjie, LIN Jiangquan, HE Yongjun, HE Jinjie, WANG Yajun. Chinese Short Text Classification Algorithm Based on Local Semantics and Context [J]. Computer Engineering and Applications, 2021, 57(6): 94-100.
[13]	ZHANG Rui, WU Boxiong, ZHANG Liyuan, ZHANG Bo. Human Trajectory Prediction Method for Complex Scenes [J]. Computer Engineering and Applications, 2021, 57(6): 138-143.
[14]	WEI Wei, YANG Ru, ZHU Ye. Target Detection of Improved CenterNet to Remote Sensing Images [J]. Computer Engineering and Applications, 2021, 57(6): 191-199.
[15]	XU Jianguo, LIU Yonghui, LIU Mengfan. Research on Semantic Role Labeling of University Policy Based on BILSTM-CRF [J]. Computer Engineering and Applications, 2021, 57(6): 207-211.

Research on Video Summarization Technology Combining Local Reward Mechanism

结合局部奖励机制的视频摘要技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics