计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (7): 79-85.

• 大数据与云计算 • 上一篇    下一篇

GPLP:基于标签传播的大图多级划分算法

张喆程,马自堂   

  1. 解放军信息工程大学三院,郑州 450000
  • 出版日期:2016-04-01 发布日期:2016-04-19

GPLP:Multi-level partition for large scale graph based on label propagation

ZHANG Zhecheng, MA Zitang   

  1. The Third Institute, PLA Information Engineering University, Zhengzhou 450000, China
  • Online:2016-04-01 Published:2016-04-19

摘要: 图数据划分问题是大图处理系统的关键问题,制约着图处理系统的计算效率。目前可用的划分算法可分为随机划分和多层次划分,已有的算法难以在划分速度和划分效果两个方面同时满足要求。提出了一种新的基于标签传播的多级划分算法GPLP,该方法将图划分过程分为数据标记、图粗糙化和数据迁移三部分,在多级划分框架下采用标签传播算法,并对其进行了改进。从数据划分时间和迭代计算时间两个方面对比GPLP算法、Hash算法和ParMETIS算法的性能,实验结果表明GPLP算法能够提高迭代计算速度,减少了划分时间,并且数据规模越大,其优势越明显。

关键词: 图划分, 标签传播, 多级划分算法, 大图

Abstract: Graph data partition is a critical part of the large scale graph and it restricts the efficiency of the system. Now the system usually uses random partition or multi-level partition, but the requirements of the two aforementioned aspects are hard to be satisfied in aspects of partition speed and partition effects. Therefore, this paper presents a new multi-level partition algorithm called GPLP (Graph Partitioning based on Label Propagation) based on the label propagation. GPLP consists of data marking, coarsening and data moving. Label propagation algorithm has been improved and used in multi-level structure. At last, some experiments have been done to compare the performance among GPLP, Hash and ParMETIS from multiple aspects. The results show that GPLP can improve the job executing efficiency and effectively solve the problems of large scale graph among partitions.

Key words: graph partition, label propagation, multi-level partition, large scale graph