Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (4): 122-125.

Previous Articles     Next Articles

On contraction method to cleansing duplicates in graph

HUANG Li1, XIONG Xin2   

  1. 1.School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430080, China
    2.College of Medical, Wuhan University of Science and Technology, Wuhan 430080, China
  • Online:2014-02-15 Published:2014-02-14

重复图数据收缩清理策略

黄  莉1,熊  欣2   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430080
    2.武汉科技大学 医学院,武汉 430080

Abstract: With the quick development of the linked data, graph data explosion has become a challenging problem. Duplicates also exit in graph data. Duplicates detecion is a hotspot in the study of hereogeneous data integration and information retrieval. However, attentions are seldom paid to duplicates cleansing after detecting. Due to the complexity and relationarity, duplicates in graph can not remove one immediately. It is more important need some sepcial methods for duplicates cleansing. This paper studies the problem and gives a solution, named on contraction method to cleansing graph duplicates. The proposed method introducs graph comtrachion to duplicates cleansing. According to the situations, different solutions are given. Experiments on publication databsets show that the proposed method is efficient ensure the relationship and stability in the graph.

Key words: graph data integration, duplicates cleansing, graph contraction

摘要: 重复数据的存在对数据管理和使用带来了极大的困扰,图数据能够很好地反应数据与数据之间的联系,是数据发展的趋势。对于重复数据对的检测已经有大量研究,但鲜有研究关注于对检测后数据对的合并清理。由于图数据中数据关联的复杂性,如果随意去掉其中一个数据将会带来数据间关系的混乱,所以,对于图数据中数据的去重问题更为重要。针对以上问题,为了保证图数据之间的关联关系和图的稳定性,研究在检测重复数据后,提出一种适合图数据中重复数据对的整合清理策略。该策略将图收缩性引入清理方法,针对不同的情况采用不同的处理方法,以保证清理后图的关联性和稳定性。

关键词: 图数据整合清理, 重复数据合并, 图收缩