Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (6): 142-148.DOI: 10.3778/j.issn.1002-8331.2009-0480

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Research on Efficient Knowledge Fusion Method for Heterogeneous Big Data Environments

WANG Yu, WANG Xin, ZHANG Shujuan, ZHENG Guoqiang, ZHAO Long, ZHENG Gaofeng   

  1. 1.State Grid Anhui Electric Power Research Institute, Hefei 230601, China
    2.State Grid Anhui Electric Power Co., Ltd., Hefei 230022, China
  • Online:2022-03-15 Published:2022-03-15



  1. 1.国网安徽省电力有限公司 电力科学研究院,合肥 230601
    2.国网安徽省电力有限公司,合肥 230022

Abstract: Knowledge fusion is one of the fundamental factors of the knowledge graph technology. However, traditional machine learning(ML) based approaches adapted to the knowledge fusions may show limited accuracy and instantaneity in heterogeneous big data environment. Hence, the paper proposes a reliable and low-complexity knowledge fusion method which integrates a concept drifting detection algorithm and a reverse verification algorithm. Firstly, an entity matching and an attribute fusion are performed by Bayesian estimation. Meanwhile, an iForset based concept drifting detection is performed on historical data samples to enhance the reliability of data model. On other hand, a self-organizing map (SOM) based unsupervised reversing verification is periodically followed to disambiguate the entities in heterogeneous databases. Based on these, the shortcomings of the supervised ML and unsupervised ML, which are the dependence on the historical data and the high computational complexity, respectively, are overcome. The proposed method has been tested on an open dataset and a knowledge graph system developed by the state grid Anhui electric power research institute. The knowledge fusion performances in terms of the reliability of data model, the entity matching ability, the F1 scores and the running time are compared and the practicability of the proposed method is analyzed and validated by experimental result in heterogeneous big data environment.

Key words: knowledge fusion, machine learning, concept drifting, reversing verification, big data

摘要: 知识融合是知识图谱技术的关键环节,而传统机器学习算法较难满足异构大数据环境中知识融合的准确性及实时性需求。提出一种结合概念漂移检测算法与无监督反向验证算法的高可靠、低复杂度知识融合方法。该方法利用贝叶斯估计进行实体对齐与属性融合的同时,周期性进行基于孤立深林算法的概念漂移检测与基于自组织映射网络的反向实体消歧,以此有效互补监督学习的样本依赖性及无监督学习的高复杂度特性,从而提高知识融合的可靠性与实时性。提出算法在公开数据集与国网安徽省电力公司知识图谱数据库中分别进行了数据实验,通过对数据模型可靠性、实体对齐能力、F1分数和运行时间的比较,分析了提出算法在多维、异构大数据环境的应用可行性。

关键词: 知识融合, 机器学习, 概念漂移, 反向验证, 大数据