Domain Entity Disambiguation Combining Multi-Feature Graph and Entity Influence

doi:10.3778/j.issn.1002-8331.2109-0494

Abstract

Abstract: Entity disambiguation is a key problem in natural language processing, aims to map ambiguous mentions in texts to target entities in the knowledge base. Existing approaches have several problems, such as only realizing single mention disambiguation, ignoring the influence of entity impact and similarity between candidate entities on disambiguation results, and increasing the computational complexity by redundant graph nodes. A domain entity disambiguation method combining multi-feature graph and entity influence is proposed. Taking the financial domain as an example, the financial domain knowledge base is constructed by extracting the keyword triads related to financial categories from CN-DBpedia. Then, it extracts mentions from financial activities, and screens out candidate entities fusing the similar features of string and semantic. It uses triples of the knowledge base to acquire relationship between entities within 2-hop, at the same time calculates similarity between candidate entities as edge weights. The multi-features are fully integrated into the graph model to finish the multi-feature graph construction. Finally, it adopts dynamic decision strategy, PageRank algorithm and entity influence are used to calculate the comprehensive score of candidate entities in the multi-features graph. And then the disambiguation results with high reliability are obtained. Experimental results verify the accuracy and efficiency of the proposed method in the specific domain.

Key words: domain entity disambiguation, entity linking, multi-feature graph, entity influence, knowledge base

摘要： 实体消歧作为自然语言处理的关键问题，旨在将文本中出现的歧义实体指称映射到知识库中的目标实体。针对现有方法存在仅实现单实体指称消歧、忽略了实体影响力及候选实体间相似度对消歧结果的影响以及冗余图节点增加图计算复杂性等问题，提出了一种融合多特征图及实体影响力的领域实体消歧方法，以金融领域为例，提取CN-Dbpedia中金融类别相关关键词三元组，构建金融领域知识库；针对金融活动类文本，提取待消歧实体指称，融合字符串及语义的相似特征，筛选出候选实体，利用知识库三元组信息获取候选实体间2-hop内的关系，同时计算候选实体间相似度作为边权值，进而将多特征信息充分融合到图模型当中，完成多特征图构建；采用动态决策策略，利用PageRank算法，并结合实体影响力计算多特征图中候选实体的综合评分，进而获得可信度较高的消歧结果。实验结果验证了提出方法在特定领域实体消歧的精确度及效率。

关键词: 领域实体消歧, 实体链接, 多特征图, 实体影响力, 知识库

SHAN Xiaohuan, QI Xin’ao, SONG Baoyan, ZHANG Haolin. Domain Entity Disambiguation Combining Multi-Feature Graph and Entity Influence[J]. Computer Engineering and Applications, 2023, 59(5): 305-311.

单晓欢, 齐鑫傲, 宋宝燕, 张浩林. 融合多特征图及实体影响力的领域实体消歧[J]. 计算机工程与应用, 2023, 59(5): 305-311.

References

[1] CHEN K，SHEN G，HUANG Z，et al.Improved entity linking for simple question answering over knowledge graph[J].International Journal of Software Engineering and Knowledge Engineering，2021，31（1）：55-80.
[2] XIN K，HUA W，LIU Y，et al.LoG：a locally-global model for entity disambiguation[J].World Wide Web，2021，24（4）：1-23.
[3] WU G，WU W F，JI H，et al.Enhanced entity mention recognition and disambiguation technologies for Chinese knowledge base Q&A[C]//Proc of the 9th Joint International Conference，Lecture Notes in Computer Science.Cham：Springer，2020：99-115.
[4] ZHU G G，IGLESIAS C A.Sematch：semantic entity search from knowledge graph[J].Results Evaluation of Pruning Methods with Varying Threshold，2015（1）：1-6.
[5] WANG H W，ZHANG F Z，XIE X，et al.DKN：deep knowledge-aware network for news recommendation[C]//Proc of the 27th World Wide Web Conference.New York：ACM，2018：1835-1844.
[6] EMAMI H.A graph-based approach to person name disambiguation in web[J].ACM Transactions on Management Information Systems，2019，10（2）：1-25.
[7] 段宗涛，李菲，陈柘.实体消歧综述[J].控制与决策，2021，36（5）：1025-1039.
DUAN Z T，LI F，CHEN T.Summary of entity disambiguation[J].Control and Decision，2021，36（5）：1025-1039.
[8] HAN X P，SUN L.A generative entity-mention model for linking entities with knowledge base[C]//Proc of the 49th Annual Meeting of the Association for Computational Linguistics.Stroudsburg：ACL，2011：945-954.
[9] SUN Y M，JI Z Z，LIN L，et al.Entity disambiguation with memory network[J].Neurocomputing，2018，275（2）：2367-2373.
[10] RAIMAN J R，RAIMAN O M.Deeptype：multilingual entity linking by neural type system evolution[C]//Proc of the 32nd AAAI Conference on Artifificial Intelligence，2018：5406-5413.
[11] ALOKAILI A，MENAI M E.SVM ensembles for named entity disambiguation[J].Computing，2019，102（4）：1051-1076.
[12] MA N，LIU X，GAO Y，et al.Entity linking based on graph model and semantic representation[C]//International Conference on Knowledge Science，Engineering and Management.Cham：Springer，2019：561-571.
[13] XIN K，HUA W，LIU Y，et al.Entity disambiguation based on parse tree neighbours on graph attention network[C]//International Conference on Web Information Systems Engineering.Cham：Springer，2019：523-537.
[14] HU L，DING J，SHI C，et al.Graph neural entity disambiguation[J].Knowledge Based Systems，2020，195（11）：716-723.
[15] YANG J，LI Y，GAO C，et al.Entity disambiguation with context awareness in user-generated short texts[J].Expert Systems with Applications，2020，160：113652.
[16] JIA B，YANG H，WU B，et al.Collective entity disambiguation based on hierarchical semantic similarity[J].International Journal of Data Warehousing and Mining，2020，16（2）：1-17.
[17] RAMA-MANEIRO E，VIDAL J C，LAMA M.Collective disambiguation in entity linking based on topic coherence in semantic graphs[J].Knowledge-Based Systems，2020，199：105967.
[18] XU B，XU Y，LIANG J Q，et al.CN-DBpedia：a never-ending Chinese knowledge extraction system[C]//International Conference on Industrial，Engineering and Other Applications of Applied Intelligent Systems.Cham：Springer，2017：428-438.
[19] 张涛，刘康，赵军.一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用[J].中文信息学报，2015，29（2）：58-68.
ZHANG T，LIU K，ZHAO J.A graph-based similarity measure between wikipedia concepts and its application in entity linking system[J].Journal of Chinese Information Processing，2015，29（2）：58-68.
[20] 高艳红，李爱萍，段利国.面向实体链接的多特征图模型实体消歧方法[J].计算机应用研究，2017，34（10）：2909-2914.
GAO Y H，LI A P，DUAN L G.Entity disambiguation method based on multi-feature fusion graph model for entity linking[J].Application Research of Computers，2017，34（10）：2909-2914.