Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (14): 158-165.DOI: 10.3778/j.issn.1002-8331.2205-0400

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Word Sense Disambiguation Combining Knowledge Graph and Text Hierarchical

CAO Yukun, JIN Chengkun, TANG Yijia, WEI Ziyue, LI Yunfeng   

  1. 1.College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China
    2.IT Center, COMAC Shanghai Aviation Industrial (Group) Co., Ltd., Shanghai 201203, China
  • Online:2023-07-15 Published:2023-07-15

结合知识图谱和文本层次结构的词义消歧方法

曹渝昆,金成坤,唐艺嘉,魏子越,李云峰   

  1. 1.上海电力大学 计算机科学与技术学院,上海 201306
    2.中国商飞上海航空工业(集团)有限公司 信息中心,上海 201203

Abstract: The current supervised word sense disambiguation model that utilizes annotated information with different word sense and pre-trained language models has achieved high disambiguation results. However, the supervised word sense disambiguation models are less scalable due to the difficulty of obtaining semantic data for manual annotation. The article proposes a bi-encoder word sense disambiguation method combining knowledge graph and text hierarchy, by introducing structured knowledge in the knowledge graph to supplement more extended semantic information, using the hierarchy of contextual input text to describe the meaning of words and phrases, and constructing a BERT-based bi-encoder, introducing a graph attention network to reduce the noise information in the contextual input text, so as to improve the disambiguation accuracy of the target words in phrase form, and ultimately improve the disambiguation effectiveness of the method. By comparing the method with the latest nine comparison algorithms in five test datasets, the disambiguation accuracy of the method mostly outperforms the comparison algorithms and achieves better results.

Key words: word sense disambiguation, knowledge graph, BERT, graph attention network

摘要: 当前带监督的词义消歧模型利用不同词义的注释信息和预训练的语言模型已经得到了较高的消歧结果。但是带监督的词义消歧模型的语义数据需要人工手动标注,使得带监督的词义消歧模型的扩展性较差。提出了一种结合知识图谱和文本层次结构的双编码器词义消歧方法,通过引入知识图谱中的结构化知识以补充更多的扩展语义信息,采用上下文输入文本的层次结构描述单词和短语的含义,并构筑基于BERT的双编码器,引入图注意力机制来降低上下文输入文本中的噪声信息,从而提高短语形式目标词的消歧准确率,最终提高方法的消歧效果。通过在5个测试数据集中与最新的9个对比算法的对比,该方法的消歧准确率大都优于对比算法,取得了较好的效果。

关键词: 词义消歧, 知识图谱, BERT, 图注意力机制