计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (4): 66-71.DOI: 10.3778/j.issn.1002-8331.1701-0197

• 大数据与云计算 • 上一篇    下一篇

基于通信负载均衡的社交网络图分割方法

刘  康1,张雪英1,李凤莲1,田玉楚1,2   

  1. 1.太原理工大学 信息工程学院,山西 晋中 030600
    2.昆士兰科技大学 电机工程及计算机科学学院,澳大利亚 布里斯班 4001
  • 出版日期:2018-02-15 发布日期:2018-03-07

Graph partitioning method for social networks based on communication load balancing

LIU Kang1, ZHANG Xueying1, LI Fenglian1, TIAN Yuchu1,2   

  1. 1.College of Information Engineering, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
    2.School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD 4001, Australia
  • Online:2018-02-15 Published:2018-03-07

摘要: 海量社交网络数据中蕴含着丰富的信息,图论是挖掘这些信息的重要方法之一。面对日益增多的图数据,分布式计算成为处理大规模图数据的有效手段。在分布式图计算中,通信所消耗的时间占有很大的比例,通过图分割算法的设计可以有效地降低通信量并实现负载均衡,从而提高分布式图计算的效率,典型的例子包括Metis图分割算法。但是,用现有的图分割算法处理非均衡图数据会造成各个子图之间通信量不均衡,从而影响了计算效率。为了解决这一问题,提出一种新的图分割方法:通信均衡标签交换方法。该方法在保持子图规模一致的基础上,既降低了全图计算所需的通信量,又使各个子图之间的通信量达到均衡。实验结果表明,与Metis等典型的图分割算法相比,提出的图分割方法在各种数据集和集群配置情况下,能降低6%~30%的图计算时间,充分显示了该方法的有效性。

关键词: 社交网络, 图论, 图分割, 分布式计算, 负载均衡

Abstract: Massive data from social networks contains a wealth of information. Among various methods to mine such information, graph theory is an attractive tool. With the increase of the volume of the graph data, distributed computation of graphs becomes an effective means to deal with large-scale graph data. In distributed graph computation, the time consumed in communications contributes significantly to the overall computation time. A well-designed graph partitioning algorithm can effectively reduce the communications as well as achieve load balancing, thus improving the efficiency of distributed graph computation. Typical examples include the Metis graph partitioning algorithm. However, existing graph partitioning algorithms to deal with social network graphs which involve non-equilibrium graph data will result in imbalance between subgraph communications, thus affecting the computational efficiency. To solve this problem, a new graph partitioning method, namely communication balanced label switching method, is presented. It behaves with three unique features: consistent subgraph scale, reduction of the communications required for the whole graph computation, and balanced communications between subgraphs. Experimental results show that in comparison with existing partitioning algorithms such as Metis, the graph partitioning method presented in this paper improves the computational time performance by 6%~30% for various data sets and cluster configurations. These results highlight the effectiveness of the presented method.

Key words: social network, graph theory, graph partitioning, distributed computing, load balancing