计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (13): 300-308.DOI: 10.3778/j.issn.1002-8331.2403-0374

• 图形图像处理 • 上一篇    下一篇

融合平移依赖性与双维度的小目标检测网络

于龙昆,王子昊,沈红,占强波   

  1. 1.南昌大学 信息工程学院,南昌 330047
    2.南昌大学 先进制造学院,南昌 330047
  • 出版日期:2025-07-01 发布日期:2025-06-30

Fusion of Translation Dependency and Dual-Dimensional Small Object Detection Network

YU Longkun, WANG Zihao, SHEN Hong, ZHAN Qiangbo   

  1. 1.School of Information Engineering, Nanchang University, Nanchang 330047, China
    2.School of Advanced Manufacturing, Nanchang University, Nanchang 330047, China
  • Online:2025-07-01 Published:2025-06-30

摘要: 针对目标检测任务中小目标尺寸小、密集度高和排布复杂,易出现漏检错检的问题,提出一种融合平移依赖性与双维度的混合注意力机制算法。旨在保证模型轻量化同时提高小目标特征提取的完整性,以及对特征处理的针对性。设计位置信息自适应模块CIM(coordinate information module),为特征图添加坐标编码,完善目标的位置信息。设计了自适应批归一化模块BNM(batch normalization attention module)及优化-激励模块OEM(optimization-excitation module),对输入数据内部以及彼此之间的特征进行更具针对性的权重分配,帮助模型在处理复杂任务时集中注意力于最相关的信息。最后设计了以一条主分支两条次分支的结构,将三个模块结合成为融合平移依赖性与双维度的混合注意力算法TDHA(translation and dual-dimensional hybrid attention)。该算法可以兼容任意目标检测网络,具备通用性。为验证算法性能,在无人机数据集VisDrone2021上进行了实验,实验结果表明,加入该算法后较基线模型mAP0.5精确度提高1.6个百分点,mAP0.5:0.95提高1个百分点;较YOLOv8精确度mAP0.5提高了1.4个百分点。

关键词: 小目标检测, 平移依赖性, 注意力机制, 轻量化

Abstract: In response to the problem of missed detection and wrong detection in small object detection tasks due to their small size, high density, and complex arrangement, this paper proposes a translation and dual-dimensional hybrid attention. This algorithm aims to ensure lightweight models while improving the integrity of small object feature extraction and the pertinence of feature processing. It designs a coordinate information module (CIM) to add coordinate encoding to the feature map, improving the accuracy of object localization. Additionally, it designs a batch normalization attention module (BNM) and an optimization-excitation module (OEM) to allocate more targeted weights to the internal and interrelated features of the input data, helping the model focus on the most relevant information when handling complex tasks. Finally, it combines these three modules into a hybrid attention algorithm called translation and dual-dimensional hybrid attention (TDHA), which consists of a main branch and two sub-branches. TDHA can be compatible with any object detection network, making it highly versatile. To validate the algorithm’s performance, it conducts experiments on the VisDrone2021 dataset, which consists of drone imagery. The experimental results show that the inclusion of this algorithm improves the mAP0.5 accuracy of the baseline model by 1.6 percentage points and the mAP0.5:0.95 by 1 percentage points. Moreover, it achieves a 1.4 percentage points improvement in mAP0.5 precision compared to YOLOv8.

Key words: small object detection, translation dependency, attention mechanism, lightweight