计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (5): 247-255.DOI: 10.3778/j.issn.1002-8331.2109-0356

• 图形图像处理 • 上一篇    下一篇

融合特征增强和自注意力的SSD小目标检测算法

张馨月,降爱莲   

  1. 太原理工大学 信息与计算机学院,山西 晋中 030600
  • 出版日期:2022-03-01 发布日期:2022-03-01

SSD Small Target Detection Algorithm Combining Feature Enhancement and Self-Attention

ZHANG Xinyue, JIANG Ailian   

  1. School of Information and Computer Science, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Online:2022-03-01 Published:2022-03-01

摘要: SSD是一种多尺度目标检测算法,由于浅层特征图缺乏语义信息,导致小目标的检测准确率低。针对这个问题,提出一种融合特征增强和自注意力的SSD小目标检测算法FA-SSD。该算法在SSD基础上构建一条自深向浅的递归反向路径,此路径包含三个模块:深层特征增强模块利用路径深层多尺度特征图生成的上下文信息和最深层特征图的语义信息,增强深层特征信息的表达能力;上采样特征增强模块通过扩大特征图的感受野,增强反向路径中上采样特征图的语义信息;自适应特征融合模块引入自注意力机制自适应地融合相邻的浅层特征图和上采样特征图,生成新的具有强语义和精确位置信息的特征图。实验结果显示,在PASCAL VOC和TT100K数据集上,FA-SSD的mAP最高达到了92.5%和80.2%,表明该检测算法能够增强浅层特征图的语义信息,对于复杂场景下的小目标有着较好的检测效果。

关键词: 小目标检测, 特征增强, 自注意力机制, 特征融合, 上下文信息

Abstract: SSD is a multi-scale target detection algorithm. Due to the lack of semantic information in shallow feature images, the detection accuracy of small targets is low. To solve this problem, a SSD small target detection algorithm, FA-SSD, which combines feature enhancement and self-attention, is proposed. The algorithm constructs a recursive reverse path from deep to shallow based on SSD, which consists of three modules:the deep feature enhancement module uses the contextual information generated from the deep multi-scale feature map and the semantic information of the deepest feature map to enhance the expression ability of the deep feature information; the up-sampling feature enhancement module enhances the semantic information of the up-sampling feature map in the reverse path by enlarging the receptive field of the feature map. The adaptive feature fusion module adaptively fuses adjacent shallow feature images and up-sampling feature images with self-attention mechanism to generate new feature images with strong semantic and precise location information. Experimental results show that on PASCAL VOC and TT100K datasets, the mAP of FA-SSD is up to 92.5% and 80.2%, indicating that this algorithm can enhance the semantic information of shallow feature images and has a good detection effect on small targets in complex scenes.

Key words: small target detection, feature enhancement, self-attention mechanism, feature fusion, context information