计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 210-221.DOI: 10.3778/j.issn.1002-8331.2407-0230

• 图形图像处理 • 上一篇    下一篇

基于高阶空间特征提取的无人机航拍小目标检测算法

张轩宇,周思航,黄健,王冬   

  1. 国防科技大学 智能科学学院,长沙 410028
  • 出版日期:2025-06-15 发布日期:2025-06-13

High-Order Spatial Feature Extraction Based Small Target Detection for UAV Aerial Photographs

ZHANG Xuanyu, ZHOU Sihang, HUANG Jian, WANG Dong   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410028, China
  • Online:2025-06-15 Published:2025-06-13

摘要: 针对视觉算法在检测航拍图像中密集小目标时容易受到目标重叠、遮挡等情况干扰的现象,提出了一种基于高阶空间特征(目标形状、位置等信息的高级表示)提取的Transformer检测头HSF-TPH(Transformer prediction head with high-order spatial feature extraction)。所提检测头中将自注意力机制中的二阶交互扩展到三阶以生成高阶空间特征,提取更有区分度的空间关系,突出每一个小目标在空间上的语义信息。同时,为了缓解骨干网络过度下采样对小目标信息的压缩,设计了一种高分辨率特征图生成机制,增加头部网络的输入特征分辨率,以提升HSF-TPH检测密集小目标的效果。设计了新的损失函数USIoU,降低算法位置偏差敏感性。在VisDrone2019数据集上开展实验证明,所提算法在面积最小、密度最高的人类目标的检测任务中实现了mAP50指标10个百分点以上的性能提升。

关键词: 无人机航拍, 小目标检测, 高阶空间特征提取, 注意力机制, 损失函数

Abstract: In response to the challenges that vision algorithms are easily interfered by target overlapping and occlusion when detecting dense and small targets in aerial images, a novel Transformer prediction head based on high-order spatial feature (high-level representations of the target’s shape, position, and other information) extraction (HSF-TPH) is proposed. This prediction head enhances the self-attention mechanism by extending the interactions from second-order to third-order, thereby generating higher-order spatial features. This advancement allows for the extraction of more discriminative spatial relationships and emphasizes the spatial semantic information of each small target. Simultaneously, to alleviate the compression of small target information due to excessive down-sampling in the backbone network, a high-resolution feature map generation mechanism is designed. The mechanism increases the input feature resolution of the head network to enhance the effect of HSF-TPH in detecting dense small targets. Finally, a new loss function, USIoU, is designed to reduce the sensitivity of algorithm to positional deviations. Experiments conducted on the VisDrone2019 dataset demonstrate that the proposed algorithm achieves a significant performance improvement of over 10 percentage points in the mAP50 metric in the detection task for human targets, which are characterized by the smallest area and highest density.

Key words: UAV aerial photography, small target detection, high-order spatial feature extraction, attention mechanism, loss function