计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (11): 242-250.DOI: 10.3778/j.issn.1002-8331.2302-0176

• 图形图像处理 • 上一篇    下一篇

多尺度特征融合的RGB-D图像显著性目标检测

王震,于万钧,陈颖   

  1. 上海应用技术大学 计算机科学与信息工程学院,上海 201418
  • 出版日期:2024-06-01 发布日期:2024-05-31

Multi-Scale Feature Fusion Saliency Object Detection Based on RGB-D Images

WANG Zhen, YU Wanjun, CHEN Ying   

  1. School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 显著性目标检测是计算机视觉的一个基础问题,目前很多基于深度学习的显著性检测方法都是将RGB图像和深度图按照输入融合或结果融合的方法进行特征融合,但这些方法并不能有效地融合特征图,为了提升显著性目标检测算法性能,提出了一种多尺度特征融合的RGB-D图像显著性目标检测方法。将模型主体设计为两个特征编码器、两个特征解码器和一个跨模特多尺度特征交错融合模块。两个特征编码器分别对应RGB图和深度图,其采用经过ImageNet数据集预训练的ResNet50网络,特征解码器用于解码编码器的五种不同尺度的输出,跨模态多尺度特征交错融合模块用于融合解码器和编码器提取的不同尺度的特征图,并将五个层次的融合结果进行拼接和降维,输出最终的显著性预测图。实验在四个公开的显著性数据集上与以往具有代表性的十个模型进行了比较,该模型在各个数据集上,相比于性能第二的模型,S-measure平均提高了0.391%,MAE平均减少了0.330%,F-measure平均减少了0.405%。提出了一种多尺度特征融合模型,摒弃了以往融合的方式,采用特征融合,将浅层和深层的特征分别进行交错融合,实验表明,提出的方法较以往的方法有更强的性能,能够取得更好的效果。

关键词: 显著性物体检测, 多模图像融合, 多支路协同预测, 多尺度特征

Abstract: Purpose salient object detection is a basic problem in computer vision. At present, many saliency detection methods based on deep learning are based on the feature fusion of RGB images and depth maps according to the method of input fusion or result fusion, but these methods cannot effectively fuse of feature maps. In order to improve the performance of salient object detection algorithms, a multi-scale feature fusion RGB-D image salient object detection method is proposed. The main body of the model is designed as two feature encoders, two feature decoders and a cross-model multi-scale feature interleaved fusion module. The two feature encoders correspond to the RGB image and the depth image respectively, which use the ResNet50 network pre-trained by the ImageNet dataset, the feature decoder is used to decode the output of the encoder in 5 different scales, and the cross-model multi-scale feature interleaved fusion module is used for the feature maps of different scales extracted by the decoder and encoder are fused, and the five-level fusion results are spliced and dimensionally reduced to output the final saliency prediction map. Experiments are compared with ten representative models in the past on four public significance data sets. Compared with the second-performing model, the S-measure of the model in this paper is increased by 0.391% on average on each data set., MAE is decreased by 0.330% on average, and F-measure is decreased by 0.405% on average. A multi-scale feature fusion model is proposed, which abandons the previous fusion method and uses feature fusion to interleave the shallow and deep features. Experiments show that the method proposed in this paper has stronger performance than previous methods, to achieve better results.

Key words: saliency object detection (SOD), multimodal image fusion, multi-path collaborative prediction, multiscale features