[1] ZENG C, BAI C, MA Q, et al. Adversarial projection learning based hashing for cross-modal retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(6): 904-912.
[2] 代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115.
DAI J, CHEN Y. Joint linear discrimination and graph regularization for task-oriented cross-modal retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 106-115.
[3] WEI X, ZHANG T Z, LI Y, et al. Multi-modality cross attention network for image and sentence matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10938-10947.
[4] 刘颖, 郭莹莹, 房杰, 等. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489-511.
LIU Y, GUO Y Y, FANG J, et al. Survey of research on deep learning image-text cross-modal retrieval[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 489-511.
[5] LI M Y, LI Y, HUANG S L, et al. Semantically supervised maximal correlation for cross-modal retrieval[C]//Proceedings of the 27th IEEE International Conference on Image Processing, 2020: 2291-2295.
[6] CHEN N, DUAN Y X, SUN Q F. Literature review of cross-modal retrieval research[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1390-1404.
[7] NGIAM J, KHOSLA A, KIM M, et al. Multimodal deep learning[C]//Proceedings of the International Conference on Machine Learning, 2011: 689-696.
[8] FENG F X, WANG X J, LI R F. Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the ACM International Conference on Multimedia, 2014: 7-16.
[9] WEI Y C, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2016, 47(2): 449-460.
[10] PENG Y X, HUANG X, QI J W. Cross-media shared representation by hierarchical learning with multiple deep networks[C]//Proceedings of the International Joint Conference on Artificial Intelligence, 2016: 3846-3853.
[11] WANG B K, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the ACM International Conference on Multimedia, 2017: 154-162.
[12] OU W, XUAN R, GOU J, et al. Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity[J]. Multimedia Tools and Applications, 2020, 79(21/22): 14733-14750.
[13] ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10394-10403.
[14] XU B B, CEN K Y, HUANG J J, et al. A Survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.
[15] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
[16] LI K, ZHANG Y, LI K, et al. Visual semantic reasoning for image-text matching[C]//Proceedings of the IEEE/CVF International Conference on Computer Visions, 2019: 4653-4661.
[17] NORCLIFFE-BROWN W, VAFEIAS E, PARISOT S. Learning conditioned graph structures for interpretable visual question answering[C]//Proceedings of the International Conference on Neural Information Processing Systems, 2018: 8344-8353.
[18] CHEN M, WEI Z, HUANG Z, et al. Simple and deep graph convolutional networks[C]//Proceedings of the International Conference on Machine Learning, 2020: 1725-1735.
[19] WANG W, ARORA R, LIVESCU K, et al. On deep multi-view representation learning[C]//Proceedings of the International Conference on Machine Learning, 2015: 1083-1092.
[20] RASIWASIA N, COSTA-PEREIRA J, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]//Proceedings of the ACM International Conference on Multimedia, 2010: 251-260.
[21] RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazon’s mechanical turk[C]//Proceedings of the Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, 2010: 139-147.
[22] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[23] HOTELLING H. Relations between two sets of variates[J]. Breakthroughs in Statistics: Methodology and Distribution, 1992: 162-190.
[24] RUPNIK J, SHAWE-TAYLOR J. Multi-view canonical correlation analysis[C]//Proceedings of the Conference on Database, Data Warehouses, Data Mining and Big Data, 2010: 1-4.
[25] KAN M, SHAN S, ZHANG H, et al. Multi-view discriminant analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 188-194.
[26] ZHAI X, PENG Y, XIAO J. Learning cross-media joint representation with sparse and semisupervised regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 965-978.
[27] PENG Y X, QI J W, YUAN Y X. CM-GANs: cross-modal generative adversarial networks for common representation learning[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2019, 15(1): 1-24.
[28] ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis[C]//Proceedings of the International Conference on Machine Learning, 2013: 1247-1255. |