计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (2): 211-220.DOI: 10.3778/j.issn.1002-8331.2208-0267

• 图形图像处理 • 上一篇    下一篇

基于多模态特征交互的RGB-D显著性目标检测

高悦,戴蒙,张晴   

  1. 上海应用技术大学 计算机科学与信息工程学院,上海 201418
  • 出版日期:2024-01-15 发布日期:2024-01-15

RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction

GAO Yue, DAI Meng, ZHANG Qing   

  1. School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • Online:2024-01-15 Published:2024-01-15

摘要: 现有的大多数RGB-D显著性目标检测方法利用深度图来提高检测效果,而忽视了其质量的影响。低质量的深度图会对最终显著目标预测结果造成污染,影响显著性检测的性能。为了消除低质量深度图带来的干扰,并准确突出RGB图像中的显著目标,提出了一个用于多模态特征交互的RGB-D显著性目标检测模型。在编码阶段,设计了一个特征交互模块,其包含三个子模块:用于增强特征表述能力的全局特征采集子模块、用于过滤低质量深度信息的深度特征精炼子模块和用于实现特征融合的多模态特征交互子模块。在解码阶段,逐层融合经过特征交互后的多模态特征,实现多层次特征融合。通过在五个基准数据集上与十二种先进方法进行的综合实验表明,该模型在NLPR、SIP和NJU2K数据集上的指标上均优于其他对比方法,其中在NJU2K数据集上,该模型的性能比第二名在平均F值上提升了0.008,加权F值上提升了0.014,E-measure上提升了0.007,表现出了较好的检测效果。

关键词: RGB-D显著性检测, 多模态特征, 特征交互, 特征融合

Abstract: Most of the existing RGB-D salient object detection methods use depth map to improve the detection effect, but ignore the influence of its quality. The low quality depth map will pollute the final saliency result and affect the performance of saliency detection. In order to eliminate the interference caused by low-quality depth maps and accurately highlight salient objects in RGB images, an RGB-D salient object detection model for multi-modal feature interaction is proposed. In the encoding stage, a feature interaction module is designed, which consists of three sub-modules:a global feature capture sub-module to enhance feature representation ability, a depth feature refinement sub-module to filter low-quality depth information, and a multi-modal feature interaction sub-module to achieve feature fusion. In the decoding stage, multi-modal features after feature interaction are fused layer by layer to achieve multi-level feature fusion. Comprehensive experiments with twelve advanced methods on five benchmark datasets show that the proposed model outperforms other comparison methods on NLPR, SIP and NJU2K datasets. Among them, on the NJU2K dataset, the performance of the model in this paper is improved by 0.008 in average F value, 0.014 in weighted F value and 0.007 in e-measure compared with that of the second place, showing a good detection effect.

Key words: RGB-D salient object detection, multi-modal feature, feature interaction, feature fusion