计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (15): 124-132.DOI: 10.3778/j.issn.1002-8331.2007-0107

• 大数据与云计算 • 上一篇    下一篇

基于最小描述长度原则的属性图概要方法

张陶,于炯,廖彬,毕雪华   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    2.新疆医科大学 医学工程技术学院,乌鲁木齐 830011
    3.新疆财经大学 统计与信息学院,乌鲁木齐 830012
  • 出版日期:2021-08-01 发布日期:2021-07-26

Method for Attributed Graph Summarization Based on Minimum Description Length

ZHANG Tao, YU Jiong, LIAO Bin, BI Xuehua   

  1. 1.School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
    2.College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi 830011, China
    3.College of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012, China
  • Online:2021-08-01 Published:2021-07-26

摘要:

图概要技术是管理、分析和可视化大规模图的关键技术之一。如何综合结构和属性信息进行图概要是一个挑战。大部分现有的图概要方法或者只考虑结构或属性某一方面的信息,或者要求属性的表现形式是一致的。结合信息论中最小描述长度原则,对属性图概要问题建模,将其转化为求解最小表示代价问题,以实现图压缩和图概要的双重目标。提出了一种计算节点属性相似性的方法,该属性度量方法对节点属性的限制较小,并且将节点间的相似性统一为存储代价,实现了节点结构相似和属性相似的协同考虑。提出了两种求解最小代价表示的图概要算法。在真实和合成的数据集上实验,验证了提出算法的有效性。

关键词: 图概要, 图聚集, 最小描述长度, 属性图, 节点相似性

Abstract:

Graph summarization is one of the key techniques of managing, analyzing and visualizing large-scale graphs. It is a challenge to summarize attribute graphs by integrating structure and attribute information. Most of the existing graph summarization methods either consider only one aspect of the structure or attribute information, or require that the representation of the attribute is consistent. Based on the principle of minimum description length in information theory, the attribute graph profile problem is modeled and transformed into solving the minimum representation cost problem to achieve the dual goals of graph compression and graph summarization. A method of calculating attribute similarity of nodes is proposed, which unifies the similarity between nodes as representation cost, and considers the neighborhood and attribute similarity of nodes. Two graph summarization algorithms for solving the minimum cost representation are proposed. Finally, experiments on real and synthetic datasets demonstrate the effectiveness of the proposed algorithm.

Key words: graph summarization, graph aggregation, minimum description length, attribute dgraph, node similarity