计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 118-123.DOI: 10.3778/j.issn.1002-8331.1905-0074

• 模式识别与人工智能 • 上一篇    下一篇

复杂场景下深度表示的无人机目标检测算法

李斌,张彩霞,杨阳,张文生   

  1. 1.佛山科学技术学院,广东 佛山 528000
    2.中国科学院 自动化研究所,北京 100080
  • 出版日期:2020-08-01 发布日期:2020-07-30

Drone Target Detection Algorithm for Depth Representation in Complex Scene

LI Bin, ZHANG Caixia, YANG Yang, ZHANG Wensheng   

  1. 1.Department of Automation, Foshan University, Foshan, Guangdong 528000, China
    2.Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China
  • Online:2020-08-01 Published:2020-07-30

摘要:

复杂地物背景下的无人机检测是“低小慢”目标检测任务中的难点问题。针对环境物体的扰动、无人机目标小而导致无人机目标检测算法准确率低,提出一种基于深度表示的复杂场景无人机目标检测方法。针对无人机目标位置检测不准确的问题,采用广义交并比度量目标真实位置与候选目标位置的偏差。针对正负样本不均衡和易分样本多而导致的学习效果差的问题,通过焦点损失的调制系数,降低负样本和易分样本的损失贡献。调整位置损失与类别损失的权重,提升位置准确性。为了验证性能,建立了一个无人机数据集。实验表明该算法在无人机数据上比YOLOv3提升了20.04%,在PASCAL VOC上比SSD和Retinanet的检测精度提升巨大。

关键词: 复杂场景, 深度表示, 广义交并比损失, 焦点损失, 损失权重

Abstract:

Drone detection in complex ground background is a difficult problem of “Low and slow” target detection. For the disturbance of environmental objects and the small target of the drone, the accuracy of the target detection algorithm of the drone is low. A method for detecting target in complex scene based on depth representation is proposed. For the problem of inaccurate detection of the target position of the drone, the generalized cross-comparison ratio is used to measure the deviation between the real position of the target and the candidate target position. For the problem that the learning effect is poor due to the imbalance of positive and negative samples and the easy to divide the sample, the loss contribution of the negative sample and the easy-to-sort sample is reduced by the modulation factor of the focus loss. The weight of position loss and category loss is adjusted to improve positional accuracy. In addition, to verify performance, a drone data set is created. Experiments show that the algorithm in this paper is 20.04% higher than the YOLOv3 in the drone data, and the detection accuracy is better than the SSD and Retinanet on the PASCAL VOC.

Key words: complex scene, depth representation, GIoU loss, focus loss, loss weight