计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (12): 280-288.DOI: 10.3778/j.issn.1002-8331.2111-0272

• 工程与应用 • 上一篇    下一篇

融合BERT-WWM和指针网络的旅游知识图谱构建研究

徐春,李胜楠   

  1. 新疆财经大学 信息管理学院,乌鲁木齐 830012
  • 出版日期:2022-06-15 发布日期:2022-06-15

Research on Construction of Tourism Knowledge Graph Integrating BERT-WWM and Pointer Network

XU Chun, LI Shengnan   

  1. School of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, China
  • Online:2022-06-15 Published:2022-06-15

摘要: 针对旅游信息呈现出散乱、无序和关联性不强的问题,提出一种融合BERT-WWM(BERT with whole word masking)和指针网络的实体关系联合抽取模型构建旅游知识图谱。借助BERT-WWM预训练语言模型从爬取的旅游评论中获得含有先验语义知识的句子编码。针对传统的实体关系抽取方法存在错误传播、实体冗余、交互缺失等问题,以及旅游评论中的实体关系存在一词多义、关系重叠等特征,提出直接对三元组建模,利用句子编码抽取头实体,根据关系类别抽取尾实体,并建立级联结构和指针网络解码输出三元组。基于Neo4j图数据库存储三元组构建旅游知识图谱。实验在建立的旅游数据集上进行,融合BERT-WWM与指针网络的实体关系联合抽取模型的准确率、召回率和F1值分别为93.42%、86.59%和89.88%,与现有模型相比三项指标均显示出优越性,验证了该方法进行实体关系联合抽取的有效性。构建的旅游知识图谱实现了旅游景区信息的整合与存储,对进一步促进旅游业发展具有一定的实际参考意义。

关键词: BERT-WWM, 指针网络, 旅游知识图谱, 关系重叠, 实体关系联合抽取

Abstract: Aiming at the problems of scattered, disordered, and weak relevance of tourism information, a joint entity relationship extraction model integrating BERT-WWM(BERT with whole word masking) and pointer network is proposed to construct a tourism knowledge graph. With the help of the BERT-WWM pre-training language model, a sentence code containing a priori semantic knowledge is obtained from the crawled travel reviews. Because of the problems of error propagation, entity redundancy, and lack of interaction in the traditional entity relationship extraction method, as well as the characteristics of polysemy and overlapping relationship in tourism comments, it is proposed to directly model the triplet, extract the header entity by sentence coding, extract the tail entity according to the relationship category, and establish a cascade structure and pointer network to decode the output triplet. A tourism knowledge graph based on Neo4j graph database storage triples is built. The experiment is carried out on the eatablished tourism data set, the accuracy, recall, F1 value of the entity-relationship joint extraction model integrating BERT-WWM and pointer network are 93.42%, 86.59%, and 89.88%, respectively. Compared with the existing models, the three indicators show advantages. The constructed tourism knowledge graph realizes the integration and storage of scenic spot information. It has a particular practical reference significance for further promoting the development of the tourism industry.

Key words: BERT with whole word masking(BERT-WWM), pointer network, tourism knowledge graph, relationship overlap, joint extraction of entity and relation