Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (17): 50-60.DOI: 10.3778/j.issn.1002-8331.2203-0243

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Language Processing Methods for Visual Question Answering

WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping   

  1. 1.YGSOFT Inc., Zhuhai, Guangdong 519085, China
    2.School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
    3.School of Mechanical Automation, Wuhan University of Science and Technology, Wuhan 430081, China
  • Online:2022-09-01 Published:2022-09-01

视觉问答语言处理方法综述

王瑞平,吴士泓,张美航,王小平   

  1. 1.远光软件股份有限公司远光研究院,广东 珠海 519085 
    2.华中科技大学 人工智能与自动化学院,武汉 430074
    3.武汉科技大学 机械自动化学院,武汉 430081

Abstract: Language processing methods in visual question answering have a huge impact on the performance of visual question answering models. Language processing methods and theories are derived from natural language processing, but in the development process they are out of touch with the most advanced research results in the field of natural language processing, which hinders the understanding of questions and the generation of answers involved in visual question answering. The root cause of this problem is subjectively the lack of researchers’ understanding of the importance of language processing methods, and objectively the lack of relevant research literature. In response to the above problems, this paper analyzes the meaning and value of language processing for visual question answering, investigates the language processing methods involved in visual question answering and the latest research results in the field of natural language processing, and summarizes the relevant application scenarios of natural language processing. The research results of this paper provide the basis and possibility for researchers to realize the importance of language processing. Finally, the future development of language processing and the promotion of natural language processing technology to visual question answering are prospected, and the deficiencies of this paper are discussed.

Key words: visual question answering, natural language processing, language model, deep neural network, artificial intelligence

摘要: 视觉问答中的语言处理方法对视觉问答模型的性能影响巨大。语言处理方法源于自然语言处理,但在发展过程中与自然语言处理领域最先进技术脱节,导致视觉问答中涉及的问题理解和答案生成受阻。产生这一问题的根源主观上是研究人员对语言处理方法的重要性认识不足,客观上则是相关研究文献的匮乏。针对上述问题,通过分析语言处理对视觉问答的价值,调查视觉问答中涉及到的语言处理方法和最新研究成果,归纳总结语言处理方法的类型,从而为研究人员认识语言处理重要性提供基础;探讨了自然语言处理技术对视觉问答中语言处理方法的推动作用,并展望了语言处理方法未来的发展方向。

关键词: 视觉问答, 自然语言处理, 语言模型, 深度神经网络, 人工智能