计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (5): 62-75.DOI: 10.3778/j.issn.1002-8331.2305-0294

• 热点与综述 • 上一篇    下一篇

多模态检索研究综述

金涛,金冉,侯腾达,袁杰,顾骁哲   

  1. 1.浙江万里学院 大数据与软件工程学院,浙江 宁波 315100
    2.江苏电力信息技术有限公司,南京 210003
  • 出版日期:2024-03-01 发布日期:2024-03-01

Review of Research on Multimodal Retrieval

JIN Tao, JIN Ran, HOU Tengda, YUAN Jie, GU Xiaozhe   

  1. 1.College of Big Data and Software Engineering, Zhejiang Wanli University, Ningbo, Zhejiang 315100, China
    2.Jiangsu Electric Power Information Technology Co., Ltd., Nanjing, Jiangsu 210003, China
  • Online:2024-03-01 Published:2024-03-01

摘要: 多模态数据的日益增长使得多模态检索技术也相继受到了不少关注。随着汽车、医学等行业引入计算机与大数据技术,大量的行业数据其本身都是以多模态形式呈现出来的,行业的快速发展使人们对信息的需求不断增加,单一模态数据检索已经无法满足人们对信息的需求。为了解决这些问题,满足一种模态的数据检索其他模态数据的需求,通过文献的查阅对多模态检索的方法进行研究,分析了公共子空间、深度学习、多模态哈希算法等不同的研究方法,梳理了近年来提出的解决这些问题的多模态检索技术。最后,对近几年来提出的多模态检索方法根据检索的准确性、检索的效率以及特点等多方面进行评价对比;对多模态检索所遇到的挑战进行分析,并展望多模态检索未来的应用前景。

关键词: 多模态检索, 公共子空间, 深度学习, 哈希算法

Abstract: With the increasing of multimodal data, multimodal retrieval technology has received a lot of attention. With the introduction of computer and big data technology in automobile, medical and other industries, a large amount of industry data itself are presented in a multi-modal form. With the rapid development of the industry, people’s demand for information is constantly increasing, and single modal data retrieval can no longer meet people’s demand for information. In order to solve these problems and meet the needs of data retrieval from one mode and other modes, this paper studies multi-modal retrieval methods through literature review, analyzes different research methods such as common subspace, deep learning and multi-modal Hash algorithm, and sorts out the multi-modal retrieval techniques proposed by researchers in recent years to solve these problems. Finally, the multimodal retrieval methods proposed in recent years are evaluated and compared according to the accuracy, efficiency and characteristics of the retrieval. This paper analyzes the challenges encountered in multimodal retrieval and looks forward to the future application prospects of multimodal retrieval.

Key words: multimodal retrieval, public subspace, deep learning, Hash algorithm