计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 122-129.DOI: 10.3778/j.issn.1002-8331.2205-0142

• 模式识别与人工智能 • 上一篇    下一篇

深层特征聚合引导的轻量级显著性目标检测

李俊文,张红英,韩宾   

  1. 1.西南科技大学 信息工程学院,四川 绵阳 621010
    2.西南科技大学 特殊环境机器人技术四川省重点实验室,四川 绵阳 621010
  • 出版日期:2023-10-01 发布日期:2023-10-01

Lightweight Saliency Object Detection Guided by Deep Feature Aggregation

LI Junwen, ZHANG Hongying, HAN Bin   

  1. 1.School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, Sichuan 621010 China
  • Online:2023-10-01 Published:2023-10-01

摘要: 目前显著性目标检测的研究大都是追求性能,而忽略了效率,导致实用性较差。为此,提出一个高效且轻量的网络模型,利用特征复用的思想构建了一种特征提取子网络(LFRM)来充分提取与聚合轻量级特征提取网络的深层特征信息,并生成初始粗糙显著预测图,来用于后续低层特征的定位目标指导;针对各阶段特征层之间的差异,构建了一种跨层交互聚合模块(CIAM)来有效进行空间信息与语义信息的聚合,并减少冗余信息;构建了一种边缘细化模块(ERM)来充分获取和利用边缘轮廓信息,同时采用一种渐进式自引导损失来增强边缘信息彼此的依赖性。最终的网络只有3.48×106的参数,且对于352×352的图片,在单张GTX 1080Ti显卡上能够达到108?FPS的运行速度。对五个基准公开数据集的测试结果表明,所提出的模型拥有跟目前最先进的SOD方法相当甚至更好的性能,同时具有更小的参数以及更快的速度。

关键词: 显著性目标检测, 轻量级, 特征提取, 边缘信息

Abstract: Most of the research on saliency target detection is to pursue performance, while ignoring efficiency, resulting in poor practicability. To this end, this paper proposes an efficient and lightweight network model. Firstly, a feature extraction sub-network(LFRM) is constructed using the idea of feature reuse to fully extract and aggregate the deep feature information of the lightweight feature extraction network, and generate the initial rough saliency prediction map that is used for positioning target guidance of subsequent low-level features. Secondly, according to the differences between feature layers at each stage, a cross-layer interactive aggregation module(CIAM) is constructed to effectively aggregate spatial information and semantic information and reduce redundant information. Finally, an edge refinement module(ERM) is constructed to fully obtain and utilize edge contour information, while adopting a progressive self-guided loss to enhance the dependence of edge information on each other. The final network has only 3.48 ×106 of parameters, and for a 352×352 image, it can reach a running speed of 108 FPS on a single GTX1080Ti graphics card. The test results on five benchmark public datasets show that the model proposed in this paper has comparable or even better performance than the current state-of-the-art SOD methods, with smaller parameters and faster speed.

Key words: saliency object detection, lightweight, feature extraction, edge information