计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (17): 126-134.DOI: 10.3778/j.issn.1002-8331.1705-0098

• 模式识别与人工智能 • 上一篇    下一篇

面向领域的命名实体消歧方法改进研究

曾维新1,赵  翔1,2,冯  滔1,唐九阳1,2   

  1. 1.国防科学技术大学 系统工程学院,长沙 410073
    2.地球空间信息技术协同创新中心,武汉430079
  • 出版日期:2018-09-01 发布日期:2018-08-30

Improved domain-oriented named entity disambiguation method study

ZENG Weixin1, ZHAO Xiang1,2, FENG Tao1, TANG Jiuyang1,2   

  1. 1.College of Information System and Management, National University of Defense Technology, Changsha 410073, China
    2.Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China
  • Online:2018-09-01 Published:2018-08-30

摘要: 命名实体消歧是将自然语言文本中具有歧义的实体指称正确地映射到知识库中相应实体上的过程。现有命名实体消歧技术大多采用集体消歧,以利用更多的语义信息达到更高的精度,但存在效率偏低的问题。为此,提出一种基于领域的命名实体消歧方法,通过引入领域的概念来丰富特征集合,并利用特征集构建实体指称-候选实体的依赖图以实现集体消歧。在构建依赖图的过程中,在现有构造方法的基础上,利用实体指称间的关系在实体指称侧建立联系,进而完善整个依赖图的结构并间接地优化算法处理顺序。在真实评测数据集上的实验结果表明,这种方法比其他同类的方法具有更高的效率和准确度。

关键词: 命名实体消歧, 领域, 依赖图, 近似算法

Abstract: Named Entity Disambiguation(NED) maps ambiguous mentions in natural language texts to corresponding entities in the knowledge base. Most of current NED techniques utilize collective disambiguation to harness more semantic information and achieve higher precision, whereas suffer from relatively lower efficiency. In this paper, a domain-oriented NED method is proposed, which introduces the concept of domain to enrich the set of features, and a mention-entity dependency graph is constructed by using the features, so as to achieve collective disambiguation. When building the dependency graph, based on existing construction methods, extra links are established between mentions by utilizing the relations among mentions. As a result, the structure of dependency graph can be optimized, and the processing order of the algorithm is also improved indirectly. Experimental results on real-life benchmark datasets show that this method is more efficient and accurate than other methods of the kind.

Key words: named entity disambiguation, domain, dependency graph, approximation algorithm