多尺度特征融合的RGB-D图像显著性目标检测

doi:10.3778/j.issn.1002-8331.2302-0176

摘要/Abstract

摘要： 显著性目标检测是计算机视觉的一个基础问题，目前很多基于深度学习的显著性检测方法都是将RGB图像和深度图按照输入融合或结果融合的方法进行特征融合，但这些方法并不能有效地融合特征图，为了提升显著性目标检测算法性能，提出了一种多尺度特征融合的RGB-D图像显著性目标检测方法。将模型主体设计为两个特征编码器、两个特征解码器和一个跨模特多尺度特征交错融合模块。两个特征编码器分别对应RGB图和深度图，其采用经过ImageNet数据集预训练的ResNet50网络，特征解码器用于解码编码器的五种不同尺度的输出，跨模态多尺度特征交错融合模块用于融合解码器和编码器提取的不同尺度的特征图，并将五个层次的融合结果进行拼接和降维，输出最终的显著性预测图。实验在四个公开的显著性数据集上与以往具有代表性的十个模型进行了比较，该模型在各个数据集上，相比于性能第二的模型，S-measure平均提高了0.391%，MAE平均减少了0.330%，F-measure平均减少了0.405%。提出了一种多尺度特征融合模型，摒弃了以往融合的方式，采用特征融合，将浅层和深层的特征分别进行交错融合，实验表明，提出的方法较以往的方法有更强的性能，能够取得更好的效果。

关键词: 显著性物体检测, 多模图像融合, 多支路协同预测, 多尺度特征

Abstract: Purpose salient object detection is a basic problem in computer vision. At present, many saliency detection methods based on deep learning are based on the feature fusion of RGB images and depth maps according to the method of input fusion or result fusion, but these methods cannot effectively fuse of feature maps. In order to improve the performance of salient object detection algorithms, a multi-scale feature fusion RGB-D image salient object detection method is proposed. The main body of the model is designed as two feature encoders, two feature decoders and a cross-model multi-scale feature interleaved fusion module. The two feature encoders correspond to the RGB image and the depth image respectively, which use the ResNet50 network pre-trained by the ImageNet dataset, the feature decoder is used to decode the output of the encoder in 5 different scales, and the cross-model multi-scale feature interleaved fusion module is used for the feature maps of different scales extracted by the decoder and encoder are fused, and the five-level fusion results are spliced and dimensionally reduced to output the final saliency prediction map. Experiments are compared with ten representative models in the past on four public significance data sets. Compared with the second-performing model, the S-measure of the model in this paper is increased by 0.391% on average on each data set., MAE is decreased by 0.330% on average, and F-measure is decreased by 0.405% on average. A multi-scale feature fusion model is proposed, which abandons the previous fusion method and uses feature fusion to interleave the shallow and deep features. Experiments show that the method proposed in this paper has stronger performance than previous methods, to achieve better results.

Key words: saliency object detection (SOD), multimodal image fusion, multi-path collaborative prediction, multiscale features

王震, 于万钧, 陈颖. 多尺度特征融合的RGB-D图像显著性目标检测[J]. 计算机工程与应用, 2024, 60(11): 242-250.

WANG Zhen, YU Wanjun, CHEN Ying. Multi-Scale Feature Fusion Saliency Object Detection Based on RGB-D Images[J]. Computer Engineering and Applications, 2024, 60(11): 242-250.

参考文献

[1] LAI B, GONG X. Saliency guided dictionary learning for weakly-supervised image parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3630-3639.
[2] ZHAO R, OUYANG W, WANG X, et al. Unsupervised salience learning for person reidentification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013: 3586-3593.
[3] SHAO L, BRADY M. Specific object retrieval based on salient regions[J]. Pattern Recognition, 2006, 39(10): 1932-1948.
[4] ZHAO X, ZHANG L, PANG Y, et al. A single stream network for robust and real-time RGB-D salient object detection[C]//Proceedings of the European Conference on Computer Vision, 2020: 646-662.
[5] DING Y, LIU Z, HUANG M, et al. Depth-aware saliency detection using convolutional neural networks[J]. Journal of Visual Communication and Image Representation, 2019, 61: 1-9.
[6] ZHANG W, JI G P, WANG Z, et al. Depth quality-inspired feature manipulation for efficient RGB-D salient object detection[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 731-740.
[7] ZHANG C, CONG R, LIN Q, et al. Cross-modality discrepant interaction network for RGB-D salient object detection[C]//Proceedings of the 29th ACM International Conference on multimedia, 2021: 2094-2102.
[8] JI W, LI J, YU S, et al. Calibrated RGB-D salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9471-9481.
[9] CHEN Q, ZHANG Z, LU Y, et al. 3-D convolutional neural networks for RGB-D salient object detection and beyond[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 35(3): 4309-4323.
[10] ZHANG P, WANG D, LU H, et al. Learning uncertain convolutional features for accurate saliency detection[C]//Proceedings of the IEEE International Conference on computer vision, 2017: 212-221.
[11] TANG B, LIU Z, TAN Y, et al. HRTransNet: HRFormer-driven two-modality salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(2): 728-742.
[12] WU J, SUN F, XU R, et al. Aggregate interactive learning for RGB-D salient object detection[J]. Expert Systems with Applications, 2022, 195: 116614.
[13] XIA C, DUAN S, FANG X, et al. EFGNet: encoder steered multi-modality feature guidance network for RGB-D salient object detection[J]. Digital Signal Processing, 2022, 131: 103775.
[14] LI G, LIU Z, CHEN M, et al. Hierarchical alternate interaction network for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 3528-3542.
[15] HUSSAIN T, ANWAR A, ANWAR S, et al. Pyramidal attention for saliency detection[J]. arXiv:2204.06788, 2022.
[16] SUN P, ZHANG W, LI S, et al. Learnable depth-sensitive attention for deep RGB-D saliency detection with multi-modal fusion architecture search[J]. International Journal of Computer Vision, 2022, 130(11): 2822-2841.
[17] CONG R, LIN Q, ZHANG C, et al. CIR-Net: cross-modality interaction and refinement for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2022, 31: 6800-6815.
[18] JU R, GE L, GENG W, et al. Depth saliency based on anisotropic center-surround difference[C]//Proceedings of the 2014 IEEE International Conference on Image Processing, 2014: 1115-1119.
[19] NIU Y, GENG Y, LI X, et al. Leveraging stereopsis for saliency analysis[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 454-461.
[20] CHENG Y, FU H, WEI X, et al. Depth enhanced saliency detection method[C]//Proceedings of the International Conference on Internet Multimedia Computing and Service, 2014: 23-27.
[21] PENG H, LI B, XIONG W, et al. RGBD salient object detection: a benchmark and algorithms[C]//Proceedings of the European Conference on Computer Vision, 2014: 92-109.
[22] ZHAO J X, CAO Y, FAN D P, et al. Contrast prior and fluid pyramid integration for RGBD salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3927-3936.
[23] PIAO Y, JI W, LI J, et al. Depth-induced multi-scale recurrent attention network for saliency detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 7254-7263.
[24] PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for RGB-D salient object detection[C]//Proceedings of the European Conference on Computer Vision, 2020: 235-252.
[25] ZHANG J, FAN D P, DAI Y, et al. Uncertainty inspired RGB-D saliency detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44: 5761-5779.
[26] FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(5): 2075-2089.
[27] JI W, LI J, ZHANG M, et al. Accurate RGB-D salient object detection via collaborative learning[C]//Proceedings of the European Conference on Computer Vision, 2020: 52-69.
[28] ZHOU W, ZHU Y, LEI J, et al. CCAFNet: crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images[J]. IEEE Transactions on Multimedia, 2021, 24: 2192-2204.
[29] SUN P, ZHANG W, WANG H, et al. Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 1407-1417.
[30] CHEN Q, LIU Z, ZHANG Y, et al. RGB-D salient object detection via 3D convolutional neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 1063-1071.