Semantic Preserving Hash for Cross-Modal Retrieval

doi:10.3778/j.issn.1002-8331.2103-0325

Abstract

Abstract: Due to the low storage and high speed of hash representation, hash based cross-modal retrieval has aroused considerable attention. Most of the supervised cross-modal hashing methods learn semantic discriminant hash codes by regression or the graph constraint. However, this kind of methods ignore the semantic discrimination of hash functions, making the out-of-sample data unable to acquire semantic preserving hash codes, and limit the accuracy of cross-modal retrieval. In order to simultaneously learn semantic preserving hash codes and hash functions, this paper proposes the semantic preserving hash（SPH） for cross-modal retrieval. SPH introduces two hash functions that project data in cross-modal spaces into the common Hamming space. And to enhance the discrimination of both hash codes and hash functions, the semantic graph is brought in. Combining the theory of locality preserving, SPH fuses the hash codes learning and hash functions learning into one common framework and optimizes them together. Experiments on three public multimodal datasets show the effectiveness and superiority of SPH on the task of cross-modal retrieval.

Key words: cross-modal retrieval, cross-modal hashing, semantic preserving, supervised learning

摘要： 哈希表示能够节省存储空间，加快检索速度，所以基于哈希表示的跨模态检索已经引起广泛关注。多数有监督的跨模态哈希方法以一种回归或图约束的方式使哈希编码具有语义鉴别性，然而这种方式忽略了哈希函数的语义鉴别性，从而导致新样本不能获得语义保持的哈希编码，限制了检索准确率的提升。为了同时学习具有语义保持的哈希编码和哈希函数，提出一种语义保持哈希方法用于跨模态检索。通过引入两个不同模态的哈希函数，将不同模态空间的样本映射到共同的汉明空间。为使哈希编码和哈希函数均具有较好的语义鉴别性，引入了语义结构图，并结合局部结构保持的思想，将哈希编码和哈希函数的学习融合到同一个框架，使两者同时优化。三个多模态数据集上的大量实验证明了该方法在跨模态检索任务的有效性和优越性。

关键词: 跨模态检索, 跨模态哈希, 语义保持, 有监督学习

KANG Peipei, LIN Zehang, YANG Zhenguo, ZHANG Zitong, LIU Wenyin. Semantic Preserving Hash for Cross-Modal Retrieval[J]. Computer Engineering and Applications, 2022, 58(21): 149-155.

康培培, 林泽航, 杨振国, 张子同, 刘文印. 语义保持哈希在跨模态检索中的应用[J]. 计算机工程与应用, 2022, 58(21): 149-155.

References

[1] ZHANG P F，LI Y，HUANG Z，et al.Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval[J].IEEE Transactions on Multimedia，2022，24：466-479.
[2] MENG M，WANG H，YU J，et al.Asymmetric supervised consistent and specific hashing for cross-modal retrieval[J].IEEE Transactions on Image Processing，2021，30：986-1000.
[3] CHENG M，JING L，NG M K.Robust unsupervised cross-modal hashing for multimedia retrieval[J].ACM Transactions on Information Systems，2020，38（3）：1-25.
[4] DING G，GUO Y，ZHOU J.Collective matrix factorization hashing for multimodal data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：2083-2090.
[5] IRIE G，ARAI H，TANIGUCHI Y.Alternating co-quantization for cross-modal hashing[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1886-1894.
[6] ZHOU J，DING G，GUO Y.Latent semantic sparse hashing for cross-modal similarity search[C]//Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval，2014：415-424.
[7] YAO T，YAN L，MA Y，et al.Fast discrete cross-modal hashing with semantic consistency[J].Neural Networks，2020，125：142-152.
[8] ZHEN Y，YEUNG D.Co-regularized hashing for multimodal data[C]//Proceedings of the Conference on Neural Information Processing Systems，2012：1385-1393.
[9] LIN Z，DING G，HU M，et al.Semantics-preserving hashing for cross-view retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3864-3872.
[10] ZHANG D，LI W.Large-scale supervised multimodal hashing with semantic correlation maximization[C]// Proceedings of the AAAI Conference on Artificial Intelligence，2014：2177-2183.
[11] XU X，SHEN F，YANG Y，et al.Learning discriminative binary codes for large-scale cross-modal retrieval[J].IEEE Transactions on Image Processing，2017，26（5）：2494-2507.
[12] LI C，DENG C，LI N，et al.Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4242-4251.
[13] YANG Z，LIN Z，KANG P，et al.Learning shared semantic space with correlation alignment for cross-modal event retrieval[J].ACM Transactions on Multimedia Computing，Communications，and Applications，2020，16：1-22.
[14] SHEN H，LIU L，YANG Y，et al.Exploiting subspace relation in semantic labels for cross-modal hashing[J].IEEE Transactions on Knowledge and Data Engineering，2020，99：1-15.
[15] CHEN Z，WANG Y，LI H，et al.A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps[C]//Proceedings of the ACM International Conference on Multimedia，2019：1694-1702.
[16] ZHANG W，KANG P，FANG X，et al.Joint sparse representation and locality preserving projection for feature extraction[J].International Journal of Machine Learning and Cybernetics，2019，10（7）：1731-1745.
[17] COSTA P J，COVIELLO E，DOYLE G，et al.On the role of correlation and abstraction in cross-modal multimedia Retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2014，36（3）：521-535.
[18] HUISKES M J，LEW M S.The MIR flickr retrieval evaluation[C]//Proceedings of the ACM International Conference on Multimedia Information Retrieval，2008：39-43.
[19] CHUA T S，TANG J，HONG R，et al.NUS-WIDE：a real-world web image database from national university of singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval，2009：368-375.