计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (23): 12-23.DOI: 10.3778/j.issn.1002-8331.2205-0160

• 热点与综述 • 上一篇    下一篇

跨模态检索技术研究综述

徐文婉,周小平,王佳   

  1. 北京建筑大学 电气与信息工程学院,北京 100044
  • 出版日期:2022-12-01 发布日期:2022-12-01

Overview of Cross-Modal Retrieval Technology

XU Wenwan, ZHOU Xiaoping, WANG Jia   

  1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
  • Online:2022-12-01 Published:2022-12-01

摘要: 跨模态检索可以通过一种模态检索出其他模态的信息,已经成为大数据时代的研究热点。研究者基于实值表示和二进制表示两种方法来减小不同模态信息的语义差距并进行有效的相似度对比,但仍会有检索效率低或信息丢失的问题。目前,如何进一步提高检索效率和信息利用率是跨模态检索研究面临的关键挑战。介绍了跨模态检索研究中基于实值表示和二进制表示两种方法的发展现状;分析对比了包含两种表示技术下以建模技术和相似性对比为主线的五种跨模态检索方法:子空间学习、主题统计模型学习、深度学习、传统哈希和深度哈希;对最新的多模态数据集进行总结,为相关的研究和工程人员提供有价值的参考资料;分析了跨模态检索面临的挑战并指出了该领域未来研究方向。

关键词: 跨模态检索, 深度学习, 哈希学习

Abstract: Cross modal retrieval can retrieve the information of other models through one model, which has become a research hot-spot in the era of big data. Researchers based on real value representation and binary representation to reduce the semantic gap of different modal information and compare the similarity effectively, but there will still be the problem of low retrieval efficiency or information loss. At present, how to further improve retrieval efficiency and information utilization is a key challenge for cross modal retrieval research. Firstly, the development status of real value representation and binary representation in cross-modal retrieval is introduced. Secondly, it analyzes and compares five cross modal retrieval methods based on modeling technology and similarity comparison under two presentation technologies:subspace learning, topic statistical model learning, deep learning, traditional hash and deep hash. Then, the latest multi-modal datasets are summarized to provide valuable reference for relevant researchers and engineers. Finally, the challenges of cross modal retrieval are analyzed and the future research directions in this field are pointed out.

Key words: cross-modal retrieval, deep learning, hash learning