Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (7): 11-19.DOI: 10.3778/j.issn.1002-8331.1711-0443

Previous Articles     Next Articles

Survey of unsupervised clustering approach oriented to entity resolution

GAO Guangshang1,2   

  1. 1.Research Center for Modern Enterprise Management, Guilin University of Technology, Guilin, Guangxi 541004, China
    2.School of Management, Guilin University of Technology, Guilin, Guangxi 541004, China
  • Online:2018-04-01 Published:2018-04-16

面向实体解析的无监督聚类方法综述

高广尚1,2   

  1. 1.桂林理工大学 现代企业管理研究中心,广西 桂林 541004
    2.桂林理工大学 商学院,广西 桂林 541004

Abstract: The aim is to analyze the mechanism of Entity Resolution(ER) from unsupervised clustering. This paper firstly elaborates the unsupervised clustering ideas from specific types, classical algorithms; then, it studies the unsupervised incremental clustering method from the classical algorithm improvements and evolution analyses. Finally, the problems to be solved in unsupervised clustering are prospected. Unsupervised clustering technology not only can solve the clustering efficiency and quality problems of traditional entity resolution, but also can use existing clustering results to implement incremental entity resolution for rapidly evolving data, to further meet the needs for incremental incremental entity resolution under the big data environment. There is no in-depth analysis of the evaluation index of unsupervised clustering algorithm. Although the unsupervised clustering method for entity analysis has many advantages, it still faces the challenges of accuracy and scalability.

Key words: Entity Resolution(ER), unsupervised clustering, unsupervised incremental clustering

摘要: 旨在从无监督聚类角度分析实体解析过程的机制。从特定类型、经典算法角度研究了无监督聚类的思路;从经典算法改进、演化分析角度研究了无监督增量聚类的思路;最后,对无监督聚类研究下一步需要解决的问题进行了展望。无监督聚类技术不仅能很好地解决传统实体解析过程中存在的聚类效率和质量问题,而且还能利用已有的聚类结果对快速演化的数据进行增量解析,进而进一步满足大数据环境下亟需的增量解析需求。没有深入分析无监督聚类算法的评价指标,尽管面向实体解析的无监督聚类方法有诸多优势,但仍然面临着准确性和可扩展性等挑战。

关键词: 实体解析, 无监督聚类, 无监督增量聚类