计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (14): 306-312.DOI: 10.3778/j.issn.1002-8331.2012-0263

• 工程与应用 • 上一篇    下一篇

基于知识图谱的中文地址匹配方法研究

陈雨晖,皮洲,姜滕圣,李响,王震,奚雪峰,吴宏杰,付保川   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215009
    2.苏州市公安局,江苏 苏州 215009
    3.苏州科技大学 苏州智慧城市研究院,江苏 苏州 215009
  • 出版日期:2022-07-15 发布日期:2022-07-15

Research on Chinese Address Matching Based on Knowledge Graph

CHEN Yuhui, PI Zhou, JIANG Tengsheng, LI Xiang, WANG Zhen, XI Xuefeng, WU Hongjie, FU Baochuan   

  1. 1.School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
    2.Suzhou Public Security Bureau, Suzhou, Jiangsu 215009, China
    3.Suzhou Smart City Research Institute, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
  • Online:2022-07-15 Published:2022-07-15

摘要: 随着信息技术的迅猛发展,建设新型高效智慧型城市已成为趋势。智慧城市中有大量以地理信息为基础的应用场景,如在城市规划建设、城市便民生活服务、城市细化管理等都离不开地理信息。由于中文地址的复杂性与人工输入的不确定性,地址数据不规范性、不一致、不明确现象给业务系统之间与内部带来了很多困难。急需优秀的中文地址匹配方法。现有的匹配方法仅从地址文字出发进行匹配,而忽略地址作为一个实体蕴含着丰富的地理知识,这些知识可以有效地协助匹配过程,由此,提出注意力知识图谱的中文地址匹配方法,从而解决复杂中文地址匹配准确率低的问题。通过对传统的标准地址库进行地址分词以及特征抽取,建立标准地址知识图谱与POI知识图谱;采用基于选择注意力机制的知识图谱关系抽取方法来进行对地址的特征提取,从而进行地址分类;通过计算知识图谱实体相似度,从而进行非标中文地址的地址匹配。实验结果表明,该方法较基于Jaccard相似度的地址匹配方法、基于动态规划的地址匹配方法、基于Sorensen Dice的全文检索地址匹配方法和基于bert4keras预训练模型的地址匹配方法准确率分别提高了11.05%、15.30%、11.05%、0.95%,有效对复杂中文地址进行匹配。

关键词: 知识图谱, 中文地址, 地址匹配

Abstract: With the rapid development of information technology, it has become a trend to build new efficient and intelligent cities. In smart cities, there are a large number of application scenarios based on geographic information, such as urban planning and construction, urban convenient life services, urban refinement management and so on, which are inseparable from geographic information. Due to the complexity of Chinese address and the uncertainty of manual input, the non-standardization, inconsistency and ambiguity of address data bring a lot of difficulties between and within business systems. An excellent Chinese address matching method is urgently needed. Matching method only match from address text, and ignoring address as a single entity contains the rich geographical knowledge, these knowledge can effectively assist in matching process, as a result, this paper puts forward attention Chinese address matching method of knowledge graph, so as to solve the problem of low accuracy of complex Chinese address matching. Firstly, the knowledge graph of standard address and the knowledge map of POI are established through the address segmentation and feature extraction of the traditional standard address library. Secondly, a knowledge graph relation extraction method based on selective attention mechanism is used to extract the features of addresses, so as to classify addresses. Finally, the address matching of non-standard Chinese addresses is carried out by calculating the entity similarity of knowledge graph. The experimental results show that the accuracy of this method is improved by 11.05%, 15.30% , 11.05% and 0.95%, respectively, compared with the address matching method based on Jaccard similarity, the address matching method based on dynamic programming , the address matching method based on Sorensen Dice and address matching method based on bert4keras pre-training model, which can effectively match complex Chinese addresses.

Key words: knowledge graph, Chinese address, address matching