Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (5): 305-311.DOI: 10.3778/j.issn.1002-8331.2109-0494

• Engineering and Applications • Previous Articles     Next Articles

Domain Entity Disambiguation Combining Multi-Feature Graph and Entity Influence

SHAN Xiaohuan, QI Xin’ao, SONG Baoyan, ZHANG Haolin   

  1. College of Information, Liaoning University, Shenyang 110036, China
  • Online:2023-03-01 Published:2023-03-01

融合多特征图及实体影响力的领域实体消歧

单晓欢,齐鑫傲,宋宝燕,张浩林   

  1. 辽宁大学 信息学院,沈阳 110036

Abstract: Entity disambiguation is a key problem in natural language processing, aims to map ambiguous mentions in texts to target entities in the knowledge base. Existing approaches have several problems, such as only realizing single mention disambiguation, ignoring the influence of entity impact and similarity between candidate entities on disambiguation results, and increasing the computational complexity by redundant graph nodes. A domain entity disambiguation method combining multi-feature graph and entity influence is proposed. Taking the financial domain as an example, the financial domain knowledge base is constructed by extracting the keyword triads related to financial categories from CN-DBpedia. Then, it extracts mentions from financial activities, and screens out candidate entities fusing the similar features of string and semantic. It uses triples of the knowledge base to acquire relationship between entities within 2-hop, at the same time calculates similarity between candidate entities as edge weights. The multi-features are fully integrated into the graph model to finish the multi-feature graph construction. Finally, it adopts dynamic decision strategy, PageRank algorithm and entity influence are used to calculate the comprehensive score of candidate entities in the multi-features graph. And then the disambiguation results with high reliability are obtained. Experimental results verify the accuracy and efficiency of the proposed method in the specific domain.

Key words: domain entity disambiguation, entity linking, multi-feature graph, entity influence, knowledge base

摘要: 实体消歧作为自然语言处理的关键问题,旨在将文本中出现的歧义实体指称映射到知识库中的目标实体。针对现有方法存在仅实现单实体指称消歧、忽略了实体影响力及候选实体间相似度对消歧结果的影响以及冗余图节点增加图计算复杂性等问题,提出了一种融合多特征图及实体影响力的领域实体消歧方法,以金融领域为例,提取CN-Dbpedia中金融类别相关关键词三元组,构建金融领域知识库;针对金融活动类文本,提取待消歧实体指称,融合字符串及语义的相似特征,筛选出候选实体,利用知识库三元组信息获取候选实体间2-hop内的关系,同时计算候选实体间相似度作为边权值,进而将多特征信息充分融合到图模型当中,完成多特征图构建;采用动态决策策略,利用PageRank算法,并结合实体影响力计算多特征图中候选实体的综合评分,进而获得可信度较高的消歧结果。实验结果验证了提出方法在特定领域实体消歧的精确度及效率。

关键词: 领域实体消歧, 实体链接, 多特征图, 实体影响力, 知识库