计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (7): 71-81.DOI: 10.3778/j.issn.1002-8331.1810-0146

• 大数据与云计算 • 上一篇    下一篇

大规模图数据边受限制的最短距离查询算法

吕  伟,宋文爱,富丽贞,许  文   

  1. 中北大学 软件学院,太原 030051
  • 出版日期:2019-04-01 发布日期:2019-04-15

Shortest Distance Query Algorithm for Large-Scale Edge Restricted Graph Data

LV Wei, SONG Wenai, FU Lizhen, XU Wen   

  1. School of Software, The North University of China, Taiyuan 030051, China
  • Online:2019-04-01 Published:2019-04-15

摘要: 计算两点之间的最短距离是标记图的基本操作之一。对于大图,根据路标节点估算两点之间最短距离的方法来提高查询效率。现有的路标节点选择策略不能在中心性和计算量小两方面同时满足,路标节点存储到其他节点的距离信息,存储量仍然很大。对于大规模有向图来说,路标节点选取策略保证中心性的同时减少了计算量,使用了DBSCAN聚类思想将节点划分成不同的类,选择具有联通性的向前和向后核心节点作为向前和向后路标节点;存储类内路标节点与普通节点之间的距离信息以及类间路标节点之间的距离信息来减少存储量;源节点通过向后路标节点和向前路标节点到达目标节点,采用上界和下界的最小均值作为估计值。理论证明算法策略在时间复杂度和空间复杂度方面与传统方法相比降低了。实验证明对于大图在平均相对误差方面与传统方法误差数量级相同。

关键词: 图数据, 边受限制, 预处理, 最短距离查询

Abstract: Calculating the shortest distance between two points is one of the basic operations on the marked graph. For the large graphs, query efficiency is improved by estimating the distance between two points according to the landmarks. The existing landmarks selection strategy cannot be satisfied in both centrality and computational complexity. The distance information storage by the landmarks to other nodes is still large. Firstly, for large-scale directed graphs, the landmark node selection strategy ensures the centrality while reducing the computational complexity. The DBSCAN clustering idea is used to divide the nodes into different classes. The connected forward and backward core nodes are selected as forward and backward landmarks. Then, the distance information between the landmarks and the other nodes in the class and the distance information among the landmarks are stored to reduce the storage. Finally, the source node reaches the target node through the backward landmarks and the forward landmarks, and the minimum mean value of the upper and lower bounds is used as the estimated value. The theory proves that the time complexity of the landmarks selection strategy is reduced by an order of magnitude compared with the traditional methods, and the data storage is reduced by an order of magnitude compared with the traditional method.

Key words: graph data, edge restricted, preprocess, shortest distance query