计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (12): 203-215.DOI: 10.3778/j.issn.1002-8331.2309-0272

• 图形图像处理 • 上一篇    下一篇

InternDiffuseDet:结合可变形卷积和扩散模型的目标检测方法

袁志祥,高永奇   

  1. 安徽工业大学 计算机科学与技术学院,安徽 马鞍山 243032
  • 出版日期:2024-06-15 发布日期:2024-06-14

InternDiffuseDet: Object Detection Method Combining Deformable Convolution and Diffusion Model

YUAN Zhixiang, GAO Yongqi   

  1. School of Computer Science and Technology, Anhui University of Technology, Ma’anshan, Anhui 243032, China
  • Online:2024-06-15 Published:2024-06-14

摘要: 针对现有目标检测中存在的漏检和误检、特征提取能力有限、处理复杂场景时检测精度不高等问题,基于DiffusionDet进行改进,提出了一种结合可变形卷积和扩散模型的目标检测方法。以模型在进入检测头之前需要更多且优质的特征图为核心思想,在主干网络中引入InternImage和DCNv3可变形卷积算子提升模型的感受野和非线性建模能力。对中间层的FPN特征金字塔进行改进,设计了一种基于选择性加权的特征金字塔CS-FPN;利用深度可分离卷积实现通道和区域的分离,同时采用CARAFE算子替代传统的上采样操作,提高分辨率和语义信息的传递;随后利用SGE注意力机制对特征图进行重组,以确保特征图在扩散的过程中保留更多的层次化信息。在特征图进入检测头之前,进行DDIM的扩散操作,获得不同时刻的特征图,以扩充检测特征图的数量。最后在目标框匹配和损失函数方面采用EIOU算法以处理目标框之间的位置偏移和尺度差异。实验数据显示,在COCO数据集和道路检测数据集上,改进后的模型在相同的实验环境下比原有模型分别提升了3.8和3.6个百分点。实验结果表明该方法在提高目标检测的准确性和鲁棒性方面具有一定的潜力,并为解决现实场景中的目标检测问题提供了新的思路和方法。

关键词: DiffusionDet, 可变形卷积, 扩散模型, 特征金字塔, 损失函数

Abstract: The paper focuses on the topic of object detection and aims to address issues such as missed detections, limited feature extraction capability, and low detection accuracy in complex scenes. Building upon DiffusionDet, a modified approach is proposed that combines deformable convolutions and diffusion models for object detection. The core idea is to increase the quantity and quality of feature maps before entering the detection head. This is achieved by introducing InternImage and DCNv3 deformable convolution operators into the backbone network, enhancing the receptive field and non-linear modeling capability of the model. An improved feature pyramid network (CS-FPN) based on selective weighting is proposed to enhance the intermediate FPN feature pyramids. Channel and spatial separations are achieved using depthwise separable convolutions, with the traditional upsampling operation being replaced by the CARAFE operator to improve resolution and semantic information transfer. Following that, the SGE attention mechanism is employed to reassemble the feature maps, ensuring the preservation of hierarchical information during diffusion. Prior to entering the detection head, the DDIM diffusion operation is performed to obtain feature maps at different time steps, thereby augmenting the quantity of detection feature maps. Finally, the EIOU algorithm is introduced in target box matching and loss functions to handle position deviations and scale differences between target boxes. Experimental results on the COCO dataset and road detection dataset demonstrate that the improved model is 3.8 and 3.6 percentage points higher than the original model, respectively, in the same experimental settings. These results indicate the potential of the proposed method to enhance the accuracy and robustness of object detection, providing new insights and approaches for addressing object detection challenges in real-world scenarios.

Key words: DiffusionDet, deformable convolution, diffusion model, feature pyramid, loss function