计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 223-230.DOI: 10.3778/j.issn.1002-8331.2004-0386

• 图形图像处理 • 上一篇    下一篇

融合检测与跟踪的半自动视频目标标注

陈庆林,谷雨,宋忠浩,聂圣东   

  1. 杭州电子科技大学 自动化学院,杭州 310018
  • 出版日期:2021-07-15 发布日期:2021-07-14

Semi-automatic Video Target Annotation by Combining Detection and Tracking

CHEN Qinglin, GU Yu, SONG Zhonghao, NIE Shengdong   

  1. College of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
  • Online:2021-07-15 Published:2021-07-14

摘要:

针对视频图像连续帧间的目标具有冗余性,采用手动标注方式耗时耗力的问题,提出一种融合检测和跟踪算法的视频目标半自动标注框架。利用手动标注的样本离线训练改进YOLO v3模型,并将该检测模型作为在线标注的检测器。在线标注时在初始帧手动确定目标位置和标签,在后续帧根据检测框与跟踪框的IOU(Intersection-Over-Union)值自动确定目标的位置,并利用跟踪器的响应输出判断目标消失,从而自动停止当前目标标注。采用一种基于目标显著性的关键帧提取算法选择关键帧。采用自建舰船目标数据集进行了改进YOLO v3检测性能对比实验,并采用舰船视频序列验证了提出的视频目标半自动标注方法的有效性。实验结果表明,该方法可以显著提高标注效率,能够快速生成标注数据,适用于海上舰船等场景的视频目标标注任务。

关键词: 视频图像, 目标标注, 目标检测, 目标跟踪, 关键帧提取

Abstract:

Aiming at the problem that the target between consecutive frames is redundant in the video and manual annotation is time-consuming and laborious, a semi-automatic video target annotation framework by combining detection and tracking is proposed. First, manually annotated samples are used to train the improved YOLOv3 detection model offline and the detection model is used as an online annotation detector. Then during online annotation, the target position and label are determined manually in the first frame, target position is determined automatically according to the IOU(Intersection-Over-Union) of the detection box and the tracking box in the subsequent frame, and the response of the tracker is used to judge the target disappearance so that the current target annotation is stopped automatically. Finally, a key frame extraction algorithm based on the target saliency is used to select the key frames. The performance comparison experiment of the improved YOLOv3 is carried out by using the self-built ship target data set, and the effectiveness of the semi-automatic video target annotation method is verified by using a ship video sequence. Experimental results show that this method can improve the annotation efficiency and generate annotated data quickly, and it is suitable for video target annotation tasks in scenes such as sea-surface ships video.

Key words: video image, target annotation, target detection, target tracking, key frames extraction