
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (17): 259-271.DOI: 10.3778/j.issn.1002-8331.2403-0179
• Graphics and Image Processing • Previous Articles Next Articles
HAN Chunyu, MA Jun, SHA Honghan, XIAO Xin, LU Chenkai, YAN Xin, ZHANG Xia
Online:2025-09-01
Published:2025-09-01
韩春禹,马骏,沙洪涵,肖鑫,陆晨凯,颜鑫,张霞
HAN Chunyu, MA Jun, SHA Honghan, XIAO Xin, LU Chenkai, YAN Xin, ZHANG Xia. Textual Modality-Assisted RGB Salient Object Detection[J]. Computer Engineering and Applications, 2025, 61(17): 259-271.
韩春禹, 马骏, 沙洪涵, 肖鑫, 陆晨凯, 颜鑫, 张霞. 文本模态辅助的RGB显著性目标检测[J]. 计算机工程与应用, 2025, 61(17): 259-271.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2403-0179
| [1] REN Z X, GAO S H, CHIA L T, et al. Region-based saliency detection and its application in object recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(5): 769-779. [2] SUN G L, WANG W G, DAI J F, et al. Mining cross-image semantics for weakly supervised semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 347-365. [3] ZHU J Y, WU J, XU Y, et al. Unsupervised object class discovery via saliency-guided multiple class learning[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(4): 862-875. [4] CHEN S, ZHAO Q. Boosted attention: leveraging human attention for image captioning[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 72-88. [5] TAVAKOLIY H R, SHETTY R, BORJI A, et al. Paying atte-ntion to descriptions generated by image captioning models[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2506-2515. [6] YUN K, PENG Y F, SAMARAS D, et al. Studying relationships between human gaze, description, and computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013: 739-746. [7] ZHANG L, ZHANG J M, LIN Z, et al. CapSal: leveraging captioning to boost semantics for salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6017-6026. [8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [9] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 6000-6010. [10] ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 6877-6886. [11] LIU N, ZHANG N, WAN K Y, et al. Visual saliency transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 4702-4712. [12] YUN Y K, LIN W. SelfReformer: self-refined network with transformer for salient objectdetection[J]. arXiv:2205.11283, 2022. [13] TANG B, LIU Z Y, TAN Y C, et al. HRTransNet: hrformer-driven two-modality salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(2): 728-742. [14] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedicalimage segmentation[C]//Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2015: 234-241. [15] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. [16] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014. [17] 陈慧, 彭力. 基于全局响应的多级融合监督显著性目标检测[J]. 计算机工程与应用, 2023, 59(24): 238-247. CHEN H, PENG L. Multi-level fusion supervised saliency object detection based on global response[J]. Computer Engineering and Applications, 2023, 59(24): 238-247. [18] PANG Y W, ZHAO X Q, ZHANG L H, et al. Multi-scale interactive network for salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9410-9419. [19] ZHAO X, PANG Y, ZHANG L, et al. Suppress and balance: a simple gated network for salient object detection[C]//Proceedings of the European Conference on Computer Vision, 2020: 35-51. [20] LIU J J, HOU Q B, CHENG M M, et al. A simple pooling-based design for real-time salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3912-3921. [21] LIU N, HAN J W, YANG M H. PiCANet: learning pixel-wise contextual attention for saliency detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 3089-3098. [22] ZHANG X N, WANG T T, QI J Q, et al. Progressive attention guided recurrent network for salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 714-722. [23] 高悦, 戴蒙, 张晴. 基于多模态特征交互的RGB-D显著性目标检测[J]. 计算机工程与应用, 2024, 60(2): 211-220. GAO Y, DAI M, ZHANG Q. RGB-D salient target detection based on multimodal feature interaction[J]. Computer Engineering and Applications, 2024, 60(2): 211-220. [24] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 548-558. [25] XU K, BA J, KIROS R, et al. Show, attendand tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning, 2015: 2048-2057. [26] GU J X, CAI J F, WANG G, et al. Stack-captioning: coarse-to-fine learning for image captioning[J]. arXiv:1709.03376, 2017. [27] LU J S, XIONG C M, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3242-3250. [28] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3156-3164. [29] YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013: 1155-1162. [30] YANG C, ZHANG L H, LU H C, et al. Saliency detection via graph-based manifold ranking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013: 3166-3173. [31] LI G B, YU Y Z. Visual saliency based on multiscale deep features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 5455-5463. [32] LI Y, HOU X D, KOCH C, et al. The secrets of salient object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 280-287. [33] WANG L J, LU H C, WANG Y F, et al. Learning to detect salient objects with image-level supervision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3796-3805. [34] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, 2014: 740-755. [35] JIANG M, HUANG S S, DUAN J Y, et al. SALICON: sali-ency in context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1072-1080. [36] QIN X B, ZHANG Z C, HUANG C Y, et al. BASNet: boundary-aware salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7471-7481. [37] WU Z, SU L, HUANG Q M. Cascaded partial decoder for fast and accurate salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3902-3911. [38] FENG M Y, LU H C, DING E R. Attentive feedback network for boundary-aware salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1623-1632. [39] ZHAO J X, LIU J J, FAN D P, et al. EGNet: edge guidance network for salient object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8779-8788. [40] GAO S H, TAN Y Q, CHENG M M, et al. Highly efficient salient object detection with 100K parameters[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 702-721. [41] ZHOU H J, XIE X H, LAI J H, et al. Interactive two-stream decoder for accurate and fast saliency detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9138-9147. [42] WEI J, WANG S H, WU Z, et al. Label decoupling framework for salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13022-13031. [43] ZHUGE M, FAN D P, LIU N, et al. Salient object detection via integrity learning[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2023, 45(3): 3738-3752. [44] MA M C, XIA C Q, LI J. Pyramidal feature shrinking for salient object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 2311-2318. |
| [1] | HAO Hefei, ZHANG Longhao, CUI Hongzhen, ZHU Xiaoyue, PENG Yunfeng, LI Xianghui. Review of Application of Deep Neural Networks in Human Pose Estimation [J]. Computer Engineering and Applications, 2025, 61(9): 41-60. |
| [2] | YAN Zhengjin, YE Zheng, GE Jun. Application of Multimodal Pre-Trained Models in Financial Invoice Information Extraction [J]. Computer Engineering and Applications, 2025, 61(9): 186-193. |
| [3] | CHEN Hong, YOU Yuzhu, JIN Haibo, WU Cong, ZOU Jiapeng. Fusion of Improved Sampling Technology and SRFCNN-BiLSTM Intrusion Detection Method [J]. Computer Engineering and Applications, 2025, 61(9): 315-324. |
| [4] | DONG Lei, WU Fuju, SHI Jianyong, PAN Longfei. Construction and Application of Multimodal Knowledge Graph in Construction Safety Field Based on Large Language Model [J]. Computer Engineering and Applications, 2025, 61(9): 325-333. |
| [5] | MENG Weichao, BIAN Chunjiang, NIE Hongbin. Method for Detecting Dim Small Infrared Targets with Low Signal-to-Noise Ratio in Complex Background [J]. Computer Engineering and Applications, 2025, 61(8): 183-193. |
| [6] | TIAN Kan, CAO Xinwen, ZHANG Haoran, XIAN Xingping, WU Tao, SONG Xiuli. Knowledge Graph Question Answering with Shared Encoding and Graph Convolution Networks [J]. Computer Engineering and Applications, 2025, 61(7): 233-244. |
| [7] | WU Bo, ZHANG Rongfen, LIU Yuhong. Research on RGB-T Multimodal Interaction Tracking Algorithm with Improved ViT [J]. Computer Engineering and Applications, 2025, 61(7): 267-277. |
| [8] | MA Yunyi, XU Ming, JIN Haibo. Multi-Channel Adaptive Feature Fusion for Urban Road Network Traffic Prediction [J]. Computer Engineering and Applications, 2025, 61(7): 334-341. |
| [9] | GUO Xiaoyu, MA Jing, CHEN Jie. Research on Multimodal Hierarchical Feature Mapping and Fusion Representation Method [J]. Computer Engineering and Applications, 2025, 61(6): 171-182. |
| [10] | LIANG Chengwu , HU Wei, YANG Jie, JIANG Songqi, HOU Ning. Fusion of Spatio-Temporal Domain Knowledge and Data-Driven for Skeleton-Based Action Recognition [J]. Computer Engineering and Applications, 2025, 61(5): 165-176. |
| [11] | PAN Weilan, ZHANG Rongfen, LIU Yuhong, ZHANG Jiyou, SUN Long. Cross-Modal Transparent Object Segmentation Combining CNN-Transformer [J]. Computer Engineering and Applications, 2025, 61(4): 222-229. |
| [12] | KANG Huanhuan, ZHANG Yuzhao, SHI Guyue. Research on Identification of Important Nodes in China-Europe Sea-Rail Transport Network from Perspective of Complex Network [J]. Computer Engineering and Applications, 2025, 61(4): 349-357. |
| [13] | JIANG Yuehan, CHEN Junjie, LI Hongjun. Review of Human Action Recognition Based on Skeletal Graph Neural Networks [J]. Computer Engineering and Applications, 2025, 61(3): 34-47. |
| [14] | LI Zehui, ZHANG Lin, SHAN Xianying. Review on Improvement and Application of 3D Convolutional Neural Networks [J]. Computer Engineering and Applications, 2025, 61(3): 48-61. |
| [15] | ZHAI Yunkai, LI Junke, LI Jinlin, QIAO Yan. Intelligent Triage of Teleconsultation Based on Multi-Feature Fusion [J]. Computer Engineering and Applications, 2025, 61(3): 326-335. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||