RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction

doi:10.3778/j.issn.1002-8331.2208-0267

Abstract

Abstract: Most of the existing RGB-D salient object detection methods use depth map to improve the detection effect, but ignore the influence of its quality. The low quality depth map will pollute the final saliency result and affect the performance of saliency detection. In order to eliminate the interference caused by low-quality depth maps and accurately highlight salient objects in RGB images, an RGB-D salient object detection model for multi-modal feature interaction is proposed. In the encoding stage, a feature interaction module is designed, which consists of three sub-modules：a global feature capture sub-module to enhance feature representation ability, a depth feature refinement sub-module to filter low-quality depth information, and a multi-modal feature interaction sub-module to achieve feature fusion. In the decoding stage, multi-modal features after feature interaction are fused layer by layer to achieve multi-level feature fusion. Comprehensive experiments with twelve advanced methods on five benchmark datasets show that the proposed model outperforms other comparison methods on NLPR, SIP and NJU2K datasets. Among them, on the NJU2K dataset, the performance of the model in this paper is improved by 0.008 in average F value, 0.014 in weighted F value and 0.007 in e-measure compared with that of the second place, showing a good detection effect.

Key words: RGB-D salient object detection, multi-modal feature, feature interaction, feature fusion

摘要： 现有的大多数RGB-D显著性目标检测方法利用深度图来提高检测效果，而忽视了其质量的影响。低质量的深度图会对最终显著目标预测结果造成污染，影响显著性检测的性能。为了消除低质量深度图带来的干扰，并准确突出RGB图像中的显著目标，提出了一个用于多模态特征交互的RGB-D显著性目标检测模型。在编码阶段，设计了一个特征交互模块，其包含三个子模块：用于增强特征表述能力的全局特征采集子模块、用于过滤低质量深度信息的深度特征精炼子模块和用于实现特征融合的多模态特征交互子模块。在解码阶段，逐层融合经过特征交互后的多模态特征，实现多层次特征融合。通过在五个基准数据集上与十二种先进方法进行的综合实验表明，该模型在NLPR、SIP和NJU2K数据集上的指标上均优于其他对比方法，其中在NJU2K数据集上，该模型的性能比第二名在平均F值上提升了0.008，加权F值上提升了0.014，E-measure上提升了0.007，表现出了较好的检测效果。

关键词: RGB-D显著性检测, 多模态特征, 特征交互, 特征融合

GAO Yue, DAI Meng, ZHANG Qing. RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction[J]. Computer Engineering and Applications, 2024, 60(2): 211-220.

高悦, 戴蒙, 张晴. 基于多模态特征交互的RGB-D显著性目标检测[J]. 计算机工程与应用, 2024, 60(2): 211-220.

References

[1] HONG S, YOU T, KWAK S, et al. Online tracking by learning discriminative saliency map with convolutional neural network[C]//International Conference on Machine Learning, 2015: 597-606.
[2] HOU Q, JIANG P T, WEI Y, et al. Self-erasing network for integral object attention[C]//Advances in Neural Information Processing Systems, 2018, 31: 547-557.
[3] YAN P, LI G, XIE Y, et al. Semi-supervised video salient object detection using pseudo-labels[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 7284-7293.
[4] LIU G, FAN D. A model of visual attention for natural image retrieval[C]//2013 International Conference on Information Science and Cloud Computing Companion, 2013: 728-733.
[5] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
[6] YANG Z, SOLTANIAN-ZADEH S, FARSIU S. BiconNet: an edge-preserved connectivity-based approach for salient object detection[J]. Pattern Recognition, 2022, 121: 108231.
[7] XIE C, XIA C, MA M, et al. Pyramid grafting network for one-stage high resolution saliency detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11717-11726.
[8] SUN P, ZHANG W, WANG H, et al. Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 1407-1417.
[9] WANG F, PAN J, XU S, et al. Learning discriminative cross-modality features for RGB-D saliency detection[J]. IEEE Transactions on Image Processing, 2022, 31: 1285-1297.
[10] JI W, LI J, YU S, et al. Calibrated RGB-D salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9471-9481.
[11] ZHANG W, JI G P, WANG Z, et al. Depth quality-inspired feature manipulation for efficient RGB-D salient object detection[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 731-740.
[12] ZHANG H, LEI J, FAN X, et al. Depth combined saliency detection based on region contrast model[C]//2012 7th International Conference on Computer Science & Education (ICCSE), 2012: 763-766.
[13] SONG H, LIU Z, DU H, et al. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning[J]. IEEE Transactions on Image Processing, 2017, 26(9): 4204-4216.
[14] REN J, GONG X, YU L, et al. Exploiting global priors for RGB-D saliency detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015: 25-32.
[15] ZHANG M, REN W, PIAO Y, et al. Select, supplement and focus for RGB-D saliency detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3472-3481.
[16] PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for RGB-D salient object detection[C]//European Conference on Computer Vision. Cham: Springer, 2020: 235-252.
[17] ZHANG Z, LIN Z, XU J, et al. Bilateral attention network for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 1949-1961.
[18] LIU N, ZHANG N, HAN J. Learning selective self-mutual attention for RGB-D saliency detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13756-13765.
[19] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[21] DE BOER P T, KROESE D P, MANNOR S, et al. A tutorial on the cross-entropy method[J]. Annals of Operations Research, 2005, 134(1): 19-67.
[22] MáTTYUS G, LUO W, URTASUN R. DeepRoadMapper: extracting road topology from aerial images[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 3438-3446.
[23] PENG H, LI B, XIONG W, et al. RGBD salient object detection: a benchmark and algorithms[C]//European Conference on Computer Vision. Cham: Springer, 2014: 92-109.
[24] FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(5): 2075-2089.
[25] JU R, GE L, GENG W, et al. Depth saliency based on anisotropic center-surround difference[C]//2014 IEEE International Conference on Image Processing (ICIP), 2014: 1115-1119.
[26] CHENG Y, FU H, WEI X, et al. Depth enhanced saliency detection method[C]//Proceedings of International Conference on Internet Multimedia Computing and Service, 2014: 23-27.
[27] NIU Y, GENG Y, LI X, et al. Leveraging stereopsis for saliency analysis[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 454-461.
[28] ARBELAEZ P, MAIRE M, FOWLKES C, et al. Contour detection and hierarchical image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(5): 898-916.
[29] GUO J, REN T, BEI J. Salient object detection for RGB-D image via saliency evolution[C]//2016 IEEE International Conference on Multimedia and Expo (ICME), 2016: 1-6.
[30] CHEN T, HU X, XIAO J, et al. CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection[J]. Neural Computing and Applications, 2022, 34(10): 7547-7563.
[31] JIANG B, ZHOU Z, WANG X, et al. CmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks[J]. IEEE Transactions on Multimedia, 2020, 23: 1343-1353.
[32] JI W, LI J, ZHANG M, et al. Accurate RGB-D salient object detection via collaborative learning[C]//European Conference on Computer Vision. Cham: Springer, 2020: 52-69.
[33] LI C, CONG R, KWONG S, et al. ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection[J]. IEEE Transactions on Cybernetics, 2020, 51(1): 88-100.
[34] CHEN H, DENG Y, LI Y, et al. RGBD salient object detection via disentangled cross-modal fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 8407-8416.
[35] LIU Z, TANG J, XIANG Q, et al. Salient object detection for RGB-D images by generative adversarial network[J]. Multimedia Tools and Applications, 2020, 79(35): 25403-25425.
[36] WU Y H, LIU Y, XU J, et al. MobileSal: extremely efficient RGB-D salient object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 10261-10269.
[37] LI J, JI W, BI Q, et al. Joint semantic mining for weakly supervised RGB-D salient object detection[C]//Advances in Neural Information Processing Systems, 2021, 34: 11945-11959.