Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (23): 12-23.DOI: 10.3778/j.issn.1002-8331.2205-0160
• Research Hotspots and Reviews • Previous Articles Next Articles
XU Wenwan, ZHOU Xiaoping, WANG Jia
Online:
2022-12-01
Published:
2022-12-01
徐文婉,周小平,王佳
XU Wenwan, ZHOU Xiaoping, WANG Jia. Overview of Cross-Modal Retrieval Technology[J]. Computer Engineering and Applications, 2022, 58(23): 12-23.
徐文婉, 周小平, 王佳. 跨模态检索技术研究综述[J]. 计算机工程与应用, 2022, 58(23): 12-23.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2205-0160
[1] WEN K Y,GU X D,CHENG Q R.Learning dual semantic relations with graph attention for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(7):2866-2879. [2] LIU J,YANG M,LI C,et al.Improving cross-modal image-text retrieval with teacher-student learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(8):3242-3253. [3] WANG W,SHEN Y,ZHANG H,et al.Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval[J].Information Processing & Management,2020,57(6):102374. [4] YUAN Z Q,ZHANG W K,RONG X E,et al.A lightweight multi-scale crossmodal text-image retrieval method in remote sensing[J].IEEE Transactions on Geoscience and Remote Sensing,2021,60:5612819. [5] NING H L,ZHAO B,YUAN Y.Semantics consistent representation learning for remote sensing image-voice retrieval[J].IEEE Transactions on Geoscience and Remote Sensing,2021,60:4700614. [6] QI A,GRYADITSKAYA Y,SONG J,et al.Toward fine-grained sketch-based 3d shape retrieval[J].IEEE Transactions on Image Processing,2021,30:8595-8606. [7] CHEN Q,CHEN Y N.Multi-view 3D model retrieval based on enhanced detail features with contrastive center loss[J].Multimedia Tools and Applications,2022,81(8):10407-10426. [8] GAO L L,LI X P,SONG J K,et al.Hierarchical LSTMs with adaptive attention for visual captioning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(5):1112-1131. [9] YANG X,WANG S S,DONG J,et al.Video moment retrieval with cross-modal neural architecture search[J].IEEE Transactions on Image Processing,2022,31:1204-1216. [10] IMURA J,FUJISAWA T,HARADA T,et al.Efficient multi-modal retrieval in conceptual space[C]//Proceedings of the 19th ACM International Conference on Multimedia(MM’11),2011:1085-1088. [11] KAUR P,PANNU H S,MALHI A K.Comparative analysis on cross-modal information retrieval:a review[J].Computer Science Review,2021,39:100336. [12] 任泽裕,王振超,柯尊旺,等.多模态数据融合综述[J].计算机工程与应用,2021,57(18):49-64. REN Z Y,WANG Z C,KE Z W,et al.Survey of multimodal data fusion[J].Computer Engineering and Application,2021,57(18):49-64. [13] PENG Y,HUANG X,ZHAO Y.An overview of cross-media retrieval:concepts,methodologies,benchmarks,and challenges[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28:2372-2385. [14] CHEN W,WANG W P,LIU L,et al.New ideas and trends in deep multimodal content understanding:a review[J].Neurocomputing,2021,426:195-215. [15] 陈宁,段友祥,孙歧峰.跨模态检索研究文献综述[J].计算机科学与探索,2021,15(8):1390-1404. CHEN N,DUAN Y X,SUN Q F.Literature review of cross modal retrieval research[J].Journal of Frontiers of Computer Science and Technology,2021,15(8):1390-1404. [16] JEON J,LAVRENKO V,MANMATHA R.Automatic image annotation and retrieval using cross-media relevance models[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval(SIGIR’03),2003:119-126. [17] 张鸿,吴飞,庄越挺.跨媒体相关性推理与检索研究[J].计算机研究与发展,2008(5):869-876. ZHANG H,WU F,ZHUANG Y T.Cross-media correlation reasoning and retrieval[J].Journal of Computer Research and Development,2008(5):869-876. [18] RASIWASIA N,COSTA PEREIRA J,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM International Conference on Multimedia(MM’10),2010:251-260. [19] HWANG S J,GRAUMAN K.Learning the relative importance of objects from tagged images for retrieval and cross-modal search[J].International Journal of Computer Vision,2012,100(2):134-153. [20] RASIWASIA N,MAHAJAN D.Cluster canonical correlation analysis[C]//Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics,2014:823-831. [21] SHAO J,ZHAO Z,SU F,et al.Towards improving canonical correlation analysis for cross-modal retrieval[J].Proceedings of the on Thematic Workshops of ACM Multimedia,2017:332-339. [22] RANJAN V,RASIWASIA N.Multi-label cross-modal retrieval[C]//IEEE International Conference on Computer Vision(ICCV),2015:4094-4102. [23] SHU X,ZHAO G Y.Scalable multi-label canonical correlation analysis for cross-modal retrieval[J].Pattern Recognition,2021,115:107905. [24] TENENBAUM J B,FREEMAN W T.Separating style and content with bilinear models[J].Neural Computation,2000,12:1247-1283. [25] CHEN Y,WANG L,WANG W,et al.Continuum regression for cross-modal multi-media retrieval[C]//19th IEEE International Conference on Image Processing,2012:1949-1952. [26] PEREIRA J C,COVIELLO E,DOYLE G,et al.On the role of correlation and abstraction in cross-modal multimedia retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36:521-535. [27] XU G,LI X,ZHIJUN Z.Semantic consistency cross-modal retrieval with semi-supervised graph regularization[J].IEEE Access,2020:14278-14288. [28] ZHANG L,MA B,LI G,et al.Generalized semi-supervised and structured subspace learning for cross-modal retrieval[J].IEEE Transactions on Multimedia,2018,20:128-141. [29] XU X,LIN K,GAO L,et al.Learning cross-modal common representations by private-shared subspaces separation[J].IEEE Transactions on Cybernetics,2022,52(5):3261-3275. [30] BLEI D M,JORDAN M I.Modeling annotated data[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval,2003:127-134. [31] WANG Y,WU F,SONG J,et al.Multi-modal mutual topic reinforce modeling for cross-media retrieval[J].Proceedings of the 22nd ACM International Conference on Multimedia,2014:307-316. [32] WU J,WU C L,LU J,et al.Region reinforcement network with topic constraint for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(1):388-397. [33] JIA Y,SALZMANN M,DARRELL T.Learning cross-modality similarity for multinomial data[C]//International Conference on Computer Vision,2011:2407-2414. [34] WU Y,WANG S,HUANG Q.Online fast adaptive low-rank similarity learning for cross-modal retrieval[J].IEEE Transactions on Multimedia,2020,22:1310-1322. [35] XIA D,MIAO L,FAN A.A cross-modal multimedia retrieval method using depth correlation mining in big data environment[J].Multimedia Tools and Applications,2020,79(1):1339-1354. [36] FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia(MM’14),2014:7-16. [37] FENG F,LI R,WANG X.Deep correspondence restricted Boltzmann machine for cross-modal retrieval[J].Neurocomputing,2015,154:50-60. [38] JIANG B,YANG J,LV Z,et al.Internet cross-media retrieval based on deep learning[J].Journal of Visual Communication and Image Representation,2017,48:356-366. [39] DONG X F,LIU L,ZHU L,et al.Adversarial graph convolutional network for cross-modal retrieval[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(3):1634-1645. [40] PENG Y,QI J.CM-GANs:cross-modal generative adversarial networks for common representation learning[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2019,15(1):1-24. [41] KOU F,DU J,CUI W,et al.Common semantic representation method based on object attention and adversarial learning for cross-modal data in IoV[J].IEEE Transactions on Vehicular Technology,2019,68(12):11588-11598. [42] SHI L,DU J,CHENG G,et al.Cross-media search method based on complementary attention and generative adversarial network for social networks[J].International Journal of Intelligent Systems,2022,37(8):4393-4416. [43] XU X,LIN K,YANG Y,et al.Joint feature synthesis and embedding:adversarial cross-modal retrieval revisited[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(6):3030-3047. [44] HUANG X,PENG Y,YUAN M.MHTN:modal-adversarial hybrid transfer network for cross-modal retrieval[J].IEEE Ttransactions on Cybernetics,2020,50(3):1047-1059. [45] ZHEN L,HU P,PENG X,et al.Deep multimodal transfer learning for cross-modal retrieval[J].IEEE Transactions on Neural Networks and Learning Systems,2022,33(2):798-810. [46] CAO W,LIN Q,HE Z,et al.Hybrid representation learning for cross-modal retrieval[J].Neuro-Computing,2019,345:45-57. [47] HU P,ZHEN L,PENG D,et al.Scalable deep multimodal learning for cross-modal retrieval[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval,2019:635-644. [48] YU J,WU X J,ZHANG D.Unsupervised multi-modal hashing for cross-modal retrieval[J].Cognitive Computation,2022,14(3):1159-1171. [49] YU J,WU X,KITTLER J.Learning discriminative hashing codes for cross-modal retrieval based on multi-view features[J].Pattern Analysis and Applications,2020,23(3):1421-1438. [50] SHEN H T,LIU L,YANG Y,et al.Exploiting subspace relation in semantic labels for cross-modal hashing[J].IEEE Transactions on Knowledge and Data Engineering,2021,33(10):3351-3365. [51] LIU X,HU Z,LING H,et al.MTFH:a matrix tri-factorization hashing framework for efficient cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(3):964-981. [52] ZHENG C,ZHU L,LU X,et al.Fast discrete collaborative multi-modal hashing for large-scale multimedia retrieval[J].IEEE Transactions on Knowledge and Data Engineering,2020,32(11):2171-2184. [53] WANG Y,LUO X,NIE L,et al.BATCH:a scalable asymmetric discrete cross-modal hashing[J].IEEE Transactions on Knowledge and Data Engineering,2021,33(11):3507-3519. [54] LIU Y,JI S,FU Q,et al.Latent semantic-enhanced discrete hashing for cross-modal retrieval[J].Applied Intelligence,2022:1-17. [55] CAO Y,LONG M,WANG J,et al.Deep visual-semantic hashing for cross-modal retrieval[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2016:1445-1454. [56] DENG C,CHEN Z,LIU X,et al.Triplet-based deep hashing network for cross-modal retrieval[J].IEEE Transactions on Image Processing,2018,27(8):3893-3903. [57] ZHANG X,LAI H,FENG J.Attention-aware deep adversarial hashing for cross-modal retrieval[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2018:614-629. [58] 吴吉祥,鲁芹,李伟霄.基于多模态注意力机制的跨模态哈希网络[J/OL].计算机工程与应用:1-14[2022-04-13].http://kns.cnki.net/kcms/detail/11.2127.TP.20210726.0859. 008.html. WU J X,LU Q,LI W X.A cross-modal hashing network based on multimodal attention mechanism[J].Computer Engineering and Applications:1-14[2022-04-13].http://kns.cnki.net/kcms/detail/11.2127.TP.20210726.0859.008.html. [59] JIANG Q,LI W.Deep cross-modal hashing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:3232-3240. [60] WANG X,ZOU X,BAKKER E M,et al.Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271. [61] XU X,SHEN F,YANG Y,et al.Learning discriminative binary codes for large-scale cross-modal retrieval[J].IEEE Transactions on Image Processing,2017,26(5):2494-2507. [62] LU X,ZHU L,CHENG Z,et al.Efficient discrete latent semantic hashing for scalable cross-modal retrieval[J].Signal Processing,2019,154:217-231. [63] ZHANG D,WU X,XU T,et al.Two-stage supervised discrete hashing for cross-modal retrieval[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2022:1-13. [64] LIU X,LI Z,WANG J,et al.Cross-modal zero-shot hashing[C]//IEEE International Conference on Data Mining(ICDM),2019:449-458. [65] XU X,LU H,SONG J,et al.Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval[J].IEEE Transactions on Cybernetics,2020,50(6):2400-2413. [66] ZHANG C,SONG J,ZHU X,et al.HCMSL:hybrid cross-modal similarity learning for cross-modal retrieval[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2021,17(1):1-22. [67] LI W,YANG S,WANG Y,et al.Multi-level similarity learning for image-text retrieval[J].Information Processing & Management,2021,58(1):102432. [68] LI Z,LU H,FU H,et al.Image-text bidirectional learning network based cross-modal retrieval[J].Neurocomputing,2022,483:148-159. [69] XIONG W,WANG S,ZHANG C,et al.WIKI-CMR:a web cross modality dataset for studying and evaluation of cross modality retrieval models[C]//IEEE International Conference on Multimedia and Expo(ICME),2013:1-6. [70] CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from national university of singapore[C]//Proceedings of the 8th ACM International Conference on Image and Video Retrieval,Santorini Island.New York:ACM,2009:1-9. [71] RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using Amazon’s Mechanical Turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk,2010:139-147. [72] YOUNG P,LAI A,HODOSH M,et al.From image descriptions to visual denotations:new similarity metrics for semantic inference over event descriptions[J].Transactions of the Association for Computational Linguistics,2014,2:67-78. [73] LIN T,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2014:740-755. [74] PENG Y,ZHAI X,ZHAO Y,et al.Semi-supervised cross-media feature learning with unified patch graph regularization[J].IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):583-596. [75] DONG X,ZHAN X,WU Y,et al.M5Product:self-harmonized contrastive learning for e-commercial multi-modal pretraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:21252-21262. |
[1] | GAO Guangshang. Survey on Attention Mechanisms in Deep Learning Recommendation Models [J]. Computer Engineering and Applications, 2022, 58(9): 9-18. |
[2] | JI Meng, HE Qinglong. AdaSVRG: Accelerating SVRG by Adaptive Learning Rate [J]. Computer Engineering and Applications, 2022, 58(9): 83-90. |
[3] | LUO Xianglong, GUO Huang, LIAO Cong, HAN Jing, WANG Lixin. Spatiotemporal Short-Term Traffic Flow Prediction Based on Broad Learning System [J]. Computer Engineering and Applications, 2022, 58(9): 181-186. |
[4] | Alim Samat, Sirajahmat Ruzmamat, Maihefureti, Aishan Wumaier, Wushuer Silamu, Turgun Ebrayim. Research on Sentence Length Sensitivity in Neural Network Machine Translation [J]. Computer Engineering and Applications, 2022, 58(9): 195-200. |
[5] | CHEN Yixiao, Alifu·Kuerban, LIN Wenlong, YUAN Xu. CA-YOLOv5 for Crowded Pedestrian Detection [J]. Computer Engineering and Applications, 2022, 58(9): 238-245. |
[6] | FANG Yiqiu, LU Zhuang, GE Junwei. Forecasting Stock Prices with Combined RMSE Loss LSTM-CNN Model [J]. Computer Engineering and Applications, 2022, 58(9): 294-302. |
[7] | SHI Jie, YUAN Chenxiang, DING Fei, KONG Weixiang. Survey of Building Target Detection in SAR Images [J]. Computer Engineering and Applications, 2022, 58(8): 58-66. |
[8] | XIONG Fengguang, ZHANG Xin, HAN Xie, KUANG Liqun, LIU Huanle, JIA Jionghao. Research on Improved Semantic Segmentation of Remote Sensing [J]. Computer Engineering and Applications, 2022, 58(8): 185-190. |
[9] | YANG Jinfan, WANG Xiaoqiang, LIN Hao, LI Leixiao, YANG Yanyan, LI Kecen, GAO Jing. Review of One-Stage Vehicle Detection Algorithms Based on Deep Learning [J]. Computer Engineering and Applications, 2022, 58(7): 55-67. |
[10] | WANG Bin, LI Xin. Research on Multi-Source Domain Adaptive Algorithm Integrating Dynamic Residuals [J]. Computer Engineering and Applications, 2022, 58(7): 162-166. |
[11] | TAN Shuqiu, TANG Guofang, TU Yuanya, ZHANG Jianxun, GE Panjie. Classroom Monitoring Students Abnormal Behavior Detection System [J]. Computer Engineering and Applications, 2022, 58(7): 176-184. |
[12] | ZHANG Meiyu, LIU Yuehui, HOU Xianghui, QIN Xujia. Automatic Coloring Method for Gray Image Based on Convolutional Network [J]. Computer Engineering and Applications, 2022, 58(7): 229-236. |
[13] | ZHANG Zhuangzhuang, QU Licheng, LI Xiang, ZHANG Minghao, LI Zhaolu. Traffic Flow Prediction with Missing Data Based on Spatial-Temporal Convolutional Neural Networks [J]. Computer Engineering and Applications, 2022, 58(7): 259-265. |
[14] | XU Jie, ZHU Yukun, XING Chunxiao. Research on Financial Trading Algorithm Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(7): 276-285. |
[15] | ZHANG Hao, ZHANG Xiaoyu, ZHANG Zhenyou, LI Wei. Summary of Intrusion Detection Models Based on Deep Learning [J]. Computer Engineering and Applications, 2022, 58(6): 17-28. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||