Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (1): 197-203.DOI: 10.3778/j.issn.1002-8331.2007-0515

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

FD-means Clustering Cleaning Algorithm for Near-Duplicate Videos

FU Yan, HAN Ze, YE Ou   

  1. College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710054, China
  • Online:2022-01-01 Published:2022-01-06

针对近重复视频的FD-means聚类清洗算法

付燕,韩泽,叶鸥   

  1. 西安科技大学 计算机科学与技术学院,西安 710054

Abstract: In recent years, with the continuous increase in the scale of video data, near-duplicate video data continue to emerge, and the quality issue of video data has become more and more prominent. The video data quality can be improved through the near-duplicate videos cleaning method. However, there are few studies on near-duplicate videos cleaning, and mainly focusing on the research of near-duplicate video detection. Although the exist methods can effectively identify near-duplicatevideo data, they are difficult to automatically cleaning near-duplicate videos data and improve the quality of video data while ensuring data integrity. In order to address the above problem, it proposes a near-duplicate videos cleaning method based on deep network and FD-means clustering fusion. This method firstly uses the MOG2 model and median filter algorithm to perform background segmentation and foreground denoising. Secondly, the VGG-16 deep network model is used to extract the depth spatial features of videos. Finally, a new FD-means clustering algorithm is constructed to update the cluster center points through the generating near-duplicate video cluster siteratively, and finally near-duplicate video data are deleted outside the center points in the cluster. The experimental results show that the proposed method can effectively clean the near-duplicate videos automatically and improve data quality of the video.

Key words: video data quality, near-duplicate videos, videos cleaning, VGG-16 deep network, FD-means(feature distance-means) clustering

摘要: 近几年,随着视频数据规模的不断增加,近重复视频数据不断涌现,视频的数据质量问题越来越突出。通过近重复视频清洗方法,有助于提高视频集的数据质量。然而,目前针对近重复视频清洗问题的研究较少,主要集中于近重复视频检索等方面的研究。现有研究方法尽管可以有效识别近重复视频,但较难在保证数据完整性的前提下,自动清洗近重复视频数据,以便改善视频数据质量。为解决上述问题,提出一种融合VGG-16深度网络与FD-means(feature distance-means)聚类的近重复视频清洗方法。该方法借助MOG2模型和中值滤波算法对视频进行背景分割和前景降噪;利用VGG-16深度网络模型提取视频的深度空间特征;构建一种新的FD-means聚类算法模型,通过迭代产生的近重复视频簇,更新簇类中心点,并最终删除簇中中心点之外的近重复视频数据。实验结果表明,该方法能够有效解决近重复视频数据清洗问题,改善视频的数据质量。

关键词: 视频数据质量, 近重复视频, 视频清洗, VGG-16深度网络, FD-means聚类