计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (2): 102-109.DOI: 10.3778/j.issn.1002-8331.2202-0051

• 模式识别与人工智能 • 上一篇    下一篇

融合知识图谱和多模态的文本分类研究

景丽,姚克   

  1. 河南财经政法大学 计算机与信息工程学院,郑州 450046
  • 出版日期:2023-01-15 发布日期:2023-01-15

Research on Text Classification Based on Knowledge Graph and Multimodal

JING Li, YAO Ke   

  1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
  • Online:2023-01-15 Published:2023-01-15

摘要: 传统文本分类方法主要是基于单模态数据所驱动的经验主义统计学习方法,缺乏对数据的理解能力,鲁棒性较差,单个模态的模型输入也难以有效分析互联网中越来越丰富的多模态化数据。针对此问题提出两种提高分类能力的方法:引入多模态信息到模型输入,旨在弥补单模态信息的局限性;引入知识图谱实体信息到模型输入,旨在丰富文本的语义信息,提高模型的泛化能力。模型使用BERT提取文本特征,改进的ResNet提取图像特征,TransE提取文本实体特征,通过前期融合方式输入到BERT模型中进行分类,在研究多标签分类问题的MM-IMDB数据集上F1值达到66.5%,在情感分析数据集Twitter15&17上ACC值达到71.1%,结果均优于其他模型。实验结果表明,引入多模态信息和实体信息能够提高模型的文本分类能力。

关键词: 自然语言处理, 知识图谱, 多模态, 文本分类, BERT模型

Abstract: Traditional text classification methods are mainly empirical statistical learning methods driven by single-modal data, which lack the ability to understand the data, and have poor robustness. The single-modal input is also difficult to effectively analyze the increasingly rich multi-modal data in the Internet. To solve this problem, two methods to improve the classification ability are proposed:introducing multi-modal information into the model input in order to make up for the limitation of single-modal information;?introducing knowledge graph entity information into the model input aims to enrich the semantic information of the text and improve model’s generalization ability.?The model uses BERT to extract text features, improved ResNet to extract image features, and TransE to extract text entity features, which are input into the BERT model for classification through early fusion. On the MM-IMDB data set which studies the multi label classification problem, the F1 score reaches 66.5%, on the Twitter15&17 data set which studies sentiment analysis problem, the ACC score reaches 71.1%, and the results are better than other models.?Experimental results show that introducing multimodal information and entity information can improve the text classification ability of the model.

Key words: natural language processing(NLP), knowledge graph, multimodal, text classification, bidirectional encoder representation from transformers(BERT)