Cross-Modal Hashing Network Based on Multimodal Attention Mechanism
WU Jixiang, LU Qin, LI Weixiao
1.College of Computer Science and Technology, Qilu University of Technology(Shandong Academy of Sciences), Jinan 250000, China
2.Internal Audit Department, China Mobile Information Technology Co., Ltd., Beijing 100000, China
WU Jixiang, LU Qin, LI Weixiao. Cross-Modal Hashing Network Based on Multimodal Attention Mechanism[J]. Computer Engineering and Applications, 2022, 58(20): 229-239.
[1] QIN Q,WEI Z,HUANG L,et al.Deep top similarity hashing with class-wise loss for multi-label image retrieval[J].Neurocomputing,2021,439:302-315.
[2] PENG D,YANG W,LIU C,et al.SAM-GAN:self-attention supporting multi-stage generative adversarial networks for text-to-image synthesis[J].Neural Networks,2021,138:57-67.
[3] ZHAO G,ZHANG M,LI Y,et al.Pyramid regional graph representation learning for content-based video retrieval[J].Information Processing & Management,2021,58(3):102488.
[4] ZANGERLE E,PICHL M,SCHEDL M.User models for culture-aware music recommendation:fusing acoustic and cultural cues[J].Transactions of the International Society for Music Information Retrieval,2020,3(1):1-16.
[5] DAGA I,GUPTA A,VARDHAN R,et al.Prediction of likes and retweets using text information retrieval[J].Procedia Computer Science,2020,168:123-128.
[6] PADMAPRIYA G,DURAISWAMY K.Multi-document-based text summarisation through deep learning algorithm[J].International Journal of Business Intelligence and Data Mining,2020,16(4):459-479.
[7] KIVRAK M,GULDOGAN E,COLAK C.Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods[J].Computer Methods and Programs in Biomedicine,2021,201:105951.
[8] UTKU K,OMER D,JUDE H.Deep learning for biomedical applications[M].[S.l.]:CRC Press,2021.
[9] LEE K,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//European Conference on Computer Vision,2018:201-216.
[10] WANG X,ZOU X,BAKKER E M,et al.Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271.
[11] YUAN M,PENG Y.Bridge-GAN:interpretable representation learning for text-to-image synthesis[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(11):4258-4268.
[12] QU W,WANG D,FENG S,et al.A novel cross-modal hashing algorithm based on multimodal deep learning[J].Science China Information Sciences,2017,60(9):1-14.
[13] DING G,GUO Y,ZHOU J.Collective matrix factorization hashing for multimodal data[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2014.
[14] ZHANG D,LI W J.Large-scale supervised multimodal hashing with semantic correlation maximization[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence,2014.
[15] WANG D,GAO X,WANG X.Semantic topic multimodal hashing for cross-media retrieval[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence,2015:3890-3896.
[16] LIN Z,DING G,HU M,et al.Semantics-preserving hashing for cross-view retrieval[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2015.
[17] JIANG Q Y,LI W J.Deep cross-modal hashing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2017:3232-3240.
[18] LIN Q,CAO W,HE Z,et al.Semantic deep cross-modal hashing[J].Neurocomputing,2020,396:113-122.
[19] MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems,2014:2204-2212.
[20] YANG M,ZHANG M,CHEN K,et al.Neural machine translation with target-attention model[J].IEICE Transactions on Information and Systems,2020,103(3):684-694.
[21] FU Q,WANG C,HAN X.A CNN-LSTM network with attention approach for learning universal sentence representation in embedded system[J].Microprocessors and Microsystems,2020,74:103051.
[22] YU X,FENG W,WANG H,et al.An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system[J].Soft Computing,2020,24(8):5831-5845.
[23] MA W,YANG Q,WU Y,et al.Double-branch multi-attention mechanism network for hyperspectral image classification[J].Remote Sensing,2019,11(11):1307.
[24] ZHU Y,LI R,YANG Y,et al.Learning cascade attention for fine-grained image classification[J].Neural Networks,2020,122:174-182.
[25] GREGOR K,DANIHELKA I,GRAVES A,et al.Draw:a recurrent neural network for image generation[C]//International Conference on Machine Learning,2015:1462-1471.
[26] ZHANG Q,SHI Y,ZHANG X.Attention and boundary guided salient object detection[J].Pattern Recognition,2020,107:107484.
[27] LIU Y,ZHANG X,HUANG F,et al.Visual question answering via combining inferential attention and semantic space mapping[J].Knowledge-Based Systems,2020,207:106339.
[28] LI W,SUN J,LIU G,et al.Visual question answering with attention transfer and a cross-modal gating mechanism[J].Pattern Recognition Letters,2020,133:334-340.
[29] CAO D,CHU J,ZHU N,et al.Cross-modal recipe retrieval via parallel-and cross-attention networks learning[J].Knowledge-Based Systems,2020,193:105428.
[30] PENG H,HE J,CHEN S,et al.Dual-supervised attention network for deep cross-modal hashing[J].Pattern Recognition Letters,2019,128:333-339.
[31] PENG X,ZHANG X,LI Y,et al.Research on image feature extraction and retrieval algorithms based on convolutional neural network[J].Journal of Visual Communication and Image Representation,2020,69:102705.
[32] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778.
[33] WANG Y,YANG H,BAI X,et al.PFAN++:bi-directional image-text retrieval with position focused attention network[J].IEEE Transactions on Multimedia,2020,23:3362-3376.
[34] QIAO B,FAN Z,WANG R,et al.A comparative study of image features and similarity measurement methods in cross-modal retrieval of commodity images[C]//2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications(AEECA),2020.
[35] LENG J,LIU Y,CHEN S.Context-aware attention network for image recognition[J].Neural Computing and Applications,2019,31(12):9295-9305.
[36] WEN K,GU X,CHENG Q.Learning dual semantic relations with graph attention for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(7):2866-2879.
[37] JI Z,WANG H,HAN J,et al.SMAN:stacked multimodal attention network for cross-modal image-text retrieval[J].IEEE Transactions on Cybernetics,2022,52(2):1086-1097.
[38] WU Y,WANG S,SONG G,et al.Learning fragment self-attention embeddings for image-text matching[C]//Proceedings of the 27th ACM International Conference on Multimedia,2019:2088-2096.
[39] CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]//ACM International Conference on Image & Video Retrieval,2009.
[40] HUISKES M J,LEW M S.The MIR flickr retrieval evaluation[C]//ACM International Conference on Multimedia Information Retrieval,2008.
[41] ESCALANTE H J,HERNáNDEZ C A,GONZALEZ J A,et al.The segmented and annotated IAPR TC-12 benchmark[J].Computer Vision and Image Understanding,2010,114(4):419-428.
[42] HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[43] WANG K,HE R,WANG W,et al.Joint feature selection and subspace learning for cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(10):2010-2023.