计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (24): 61-72.DOI: 10.3778/j.issn.1002-8331.2205-0064

• 热点与综述 • 上一篇    下一篇

跨模态检索研究综述

侯腾达,金冉,王晏祎,蒋义凯   

  1. 1.浙江万里学院 大数据与软件工程学院,浙江 宁波 315100
    2.浙江大学 计算机科学与技术学院,杭州 310027
  • 出版日期:2022-12-15 发布日期:2022-12-15

Review of Cross-Modal Retrieval

HOU Tengda, JIN Ran, WANG Yanyi, JIANG Yikai   

  1. 1.College of Big Data and Software Engineering, Zhejiang Wanli University, Ningbo, Zhejiang 315100, China
    2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Online:2022-12-15 Published:2022-12-15

摘要: 近年来,各种类型的媒体数据,如音频、文本、图像和视频,在互联网上呈现爆发式增长,不同类型的数据通常用于描述同一事件或主题。跨模态检索提供了一些有效的方法,可以为任何模态的给定查询搜索不同模态的语义相关结果,使用户能够获得有关事件/主题的更多信息,从而达到以一种模态数据检索另外一种模态数据的效果。随着数据检索需求以及各种新技术的发展,单一模态检索难以满足用户需求,研究者提出许多跨模态检索的技术来解决这个问题。梳理近期跨模态检索领域研究者的研究成果,简要分析传统的跨模态检索方法,着重介绍近五年研究者提出跨模态检索方法,并对其性能表现进行对比;总结现阶段跨模态检索研究过程中面临的问题,并对后续发展做出展望。

关键词: 跨模态检索, 子空间学习, 深度学习, 跨模态哈希

Abstract: In the past decades, various types of media data, such as audio, text, image and video, have shown explosive growth on the Internet. Different types of data are usually used to describe the same event or theme. Cross modal retrieval(CMR) provides some effective methods, which can search the semantic related results of different modes for a given query of any mode, so that users can obtain more information about events / topics, so as to achieve the effect of retrieving data of one mode from data of another mode. With the development of the first mock exam and the demand of data retrieval and the development of new technologies, researchers have proposed many cross-modal retrieval techniques to solve this problem. This paper reviews the recent research results of researchers in the field of cross modal retrieval, briefly analyzes the traditional cross modal retrieval methods, focuses on the cross-modal retrieval methods proposed by researchers in recent five years, and compares their performance. This paper summarizes the problems faced in the research process of cross modal retrieval at this stage, and looks forward to the future development.

Key words: cross-modal retrieval, subspace learning, deep learning, cross-modal hashing