多模态数据融合综述

doi:10.3778/j.issn.1002-8331.2104-0237

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (18): 49-64.DOI: 10.3778/j.issn.1002-8331.2104-0237

多模态数据融合综述

任泽裕，王振超，柯尊旺，李哲，吾守尔·斯拉木

1.新疆多语种信息技术实验室，新疆多语种信息技术研究中心，乌鲁木齐 830046
2.新疆大学信息科学与工程学院，乌鲁木齐 830046
3.新疆大学软件学院，乌鲁木齐 830046

出版日期:2021-09-15 发布日期:2021-09-13

Survey of Multimodal Data Fusion

REN Zeyu, WANG Zhenchao, KE Zunwang, LI Zhe, Wushour·Silamu

1.Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, Urumqi 830046, China
2.School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
3.School of Software, Xinjiang University, Urumqi 830046, China

Online:2021-09-15 Published:2021-09-13

摘要/Abstract

摘要：

随着当今信息技术的飞速发展，信息的存在形式多种多样，来源也十分广泛。不同的存在形式或信息来源均可被称之为一种模态，由两种或两种以上模态组成的数据称之为多模态数据。多模态数据融合负责将多个模态的信息进行有效的整合，汲取不同模态的优点，完成对信息的整合。自然现象具有十分丰富的特征，单一模态很难提供某个现象的完整信息。面对保持融合后具有各个模态信息的多样性以及完整性、使各个模态的优点最大化、减少融合过程造成的信息损失等方面的融合要求，如何对各个模态的信息进行融合成为了多个领域广泛存在的一个新挑战。简要阐述了常见的多模态融合方法、融合架构，总结了三个常见的融合模型，简要分析协同、联合、编解码器三大架构的优缺点以及多核学习、图像模型等具体融合方法。在多模态的应用方面，对多模态视频片段检索、综合多模态信息生成内容摘要、多模态情感分析、多模态人机对话系统进行了分析与总结。指出了当前多模态融合出现的问题，并提出未来的研究方向。

关键词: 多模态, 多模态融合, 多模态融合架构, 机器学习, 神经网络

Abstract:

With the rapid development of information technology, information exists in various forms and sources. Different forms of existence or information sources can be referred to as one modal, and data composed of two or more modalities is called multi-modal data. Multi-modal data fusion is responsible for effectively integrating the information of multiple modalities, absorbing the advantages of different modalities, and completing the integration of information. Natural phenomena have very rich characteristics, and it is difficult for a single mode to provide complete information about a certain phenomenon. Faced with the fusion requirements of maintaining the diversity and completeness of the modal information after fusion, maximizing the advantages of each modal, and reducing the information loss caused by the fusion process, how to integrate the information of each modal has become a new challenge that exists in many fields. This paper briefly describes common multimodal fusion methods and fusion architectures, summarizes three common fusion models, and briefly analyzes the advantages and disadvantages of the three architectures of collaboration, joint, and codec, as well as specific fusion methods such as multi-core learning and image models. In the application of multi-modality, it analyzes and summarizes multi-modal video clip retrieval, comprehensive multi-modal information generation content summary, multi-modal sentiment analysis, and multi-modal man-machine dialogue system. The paper also proposes the current problems of multi-modal fusion and the future research directions.

Key words: multimodal, multimodal fusion, multimodal fusion architecture, machine learning, neural network

任泽裕，王振超，柯尊旺，李哲，吾守尔·斯拉木. 多模态数据融合综述[J]. 计算机工程与应用, 2021, 57(18): 49-64.

REN Zeyu, WANG Zhenchao, KE Zunwang, LI Zhe, Wushour·Silamu. Survey of Multimodal Data Fusion[J]. Computer Engineering and Applications, 2021, 57(18): 49-64.

[1]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[2]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[3]	王林，柴江云. 深度神经网络在多场景车辆属性识别中的研究[J]. 计算机工程与应用, 2021, 57(9): 162-167.
[4]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[5]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[6]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[7]	陆莉霞，邹俊忠，郭玉成，张见，王蓓. 多模态融合的膝关节损伤预测[J]. 计算机工程与应用, 2021, 57(9): 225-232.
[8]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[9]	许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.
[10]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[11]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[12]	李震霄，孙伟，刘明明，郑丽丽，陈劭颖. 交通监控场景中的车辆检测与跟踪算法研究[J]. 计算机工程与应用, 2021, 57(8): 103-111.
[13]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[14]	李现国，冯欣欣，李建雄. 多尺度残差网络的单幅图像超分辨率重建[J]. 计算机工程与应用, 2021, 57(7): 215-221.
[15]	翟正利，李鹏辉，冯舒. 图对抗攻击研究综述[J]. 计算机工程与应用, 2021, 57(7): 14-21.

多模态数据融合综述

Survey of Multimodal Data Fusion

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics