Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (6): 227-233.DOI: 10.3778/j.issn.1002-8331.2009-0446

• Graphics and Image Processing • Previous Articles     Next Articles

Research on Propagation of Similarity Between Frames in Video Object Segmentation

ZHANG Xuerui, SUN Fengming, YUAN Xia   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2022-03-15 Published:2022-03-15

视频目标分割中帧间相似性传播的研究

章雪瑞,孙凤铭,袁夏   

  1. 南京理工大学 计算机科学与工程学院,南京 210094

Abstract: At present, video target segmentation algorithms are mostly based on matching and propagation strategies to segment the target, and often use the information of the previous frame in the form of mask or optical flow. A new way of feature propagation method between frames is explored in this paper. It uses short-term matching module to extract the previous frame information for propagation, and proposes an object segmentation model for video sequence data. Through the long-term matching module and the short-term matching module to perform pixel-level matching with the first frame and the previous frame respectively, the global similarity map and the local similarity map are obtained, as well as the mask of the previous frame and the feature map of the current frame. The result is obtained through the segmentation network after passing through two optimized networks. Experiments on the public video object segmentation dataset show that the proposed model achieves good regional similarity and contour accuracy J&F 86.5% and 77.4% on single target and multi-target segmentation respectively, without online fine-tuning, and 21 frames per second can be calculated. The short-term matching module proposed in this paper is more conducive to extracte the information of the previous frame than simply using the mask. Through the combination of the long-term matching module and the short-term matching module, efficient video target segmentation can be achieved without online fine-tuning, which is more suitable applied to mobile robot environment perception content.

Key words: visual perception, video object segmentation, feature propagation, long-short-term matching

摘要: 目前视频目标分割算法多是基于匹配和传播策略分割目标,常常以掩模或者光流的方式利用前一帧的信息,探索了新的帧间特征传播方式,利用短时匹配模块提取前一帧信息并传播给当前帧,提出一种面向视频序列数据的目标分割模型。通过长时匹配模块和短时匹配模块分别与第一帧和前一帧做相关操作进行像素级匹配,得到的全局相似性图和局部相似性图,以及前一帧的掩模和当前帧的特征图,经过两个优化网络后通过分割网络得到分割结果。在视频目标分割公开数据集上的实验表明,所提出方法在单目标和多目标上分别取得了86.5%和77.4%的区域相似度和轮廓精度均值,每秒可计算21帧。提出的短时匹配模块比仅使用掩模更有利于提取前一帧的信息,通过长时匹配模块和短时匹配模块的结合,不使用在线微调即可实现高效的视频目标分割,适合应用于移动机器人视觉感知。

关键词: 视觉感知, 视频目标分割, 特征传播, 长-短时匹配